Queries to Non-Relational Data using Natural Language based on a Large Language Model

Main Article Content

Adilbek Omirbekovich Erkimbaev
Vladimir Yurievich Zitserman
George Anatolyevich Kobzev

Abstract

The main purpose of this work is to explore new opportunities for organizing natural language queries in scientific local databases that are not relational. A brief review of recent research shows that there has been an active introduction of natural language queries into databases of various types, and the use of machine learning methods, such as neural algorithms, is noted. The widespread use of large language models in the last two years for query generation in various language settings and fields of expertise has been demonstrated. A study has been conducted to explore the potential of the AllegroGraph graph database in using large language models for natural language search. The functionality of the database has been examined using the example of a metadata system for thermophysical properties in the form of the "Thermal" domain ontology. Testing search queries in a bilingual (English and Russian) database environment has revealed some general problems that can be overcome, and it gives us good hope for the future application of new services using large language models.

Article Details

How to Cite
Erkimbaev, A. O., V. Y. Zitserman, and G. A. Kobzev. “Queries to Non-Relational Data Using Natural Language Based on a Large Language Model ”. Russian Digital Libraries Journal, vol. 29, no. 1, Feb. 2026, pp. 76-98, doi:10.26907/1562-5419-2026-29-1-76-98.

References

Erkimbaev A.O., Zitserman V.Iu., Kobzev G.A. Tipologiia materialovedcheskikh dannykh // Nauchno-tekhnicheskaia informatsiia. Ser. 2. 2023. № 6. S. 25–39.
2. Erkimbaev A.O., Zitserman V.Iu., Kobzev G.A., Kosinov A.V. O predstavlenii i otsenke nauchnykh dannykh chislovogo i nechislovogo tipa pri provedenii issledovanii po svoistvam materialov // Nauchno-tekhnicheskaia informatsiia. Ser. 2. 2023. № 2. S. 8–16.
3. Woods W.A. Semantics and quantification in natural language question answering. // Advances in computers. N.Y. etc.: Acad. Press, 1978. Vol. 1 7. P. 1–87. URL: https://web.stanford.edu/class/linguist289/woods.pdf
4. Borodin D.S., Stroganov Iu.V. K zadache sostavleniia zaprosov k bazam dannykh na estestvennom iazyke // Novye informatsionnye tekhnologii v avtomatizirovannykh sistemakh: materialy 19 nauchno-prakticheskogo seminara. M.: IPM im. M.V. Keldysha, aprel 2016. P. 119–125.
5. Bolshakova E.I., Klyshinskii E. S., Lande D.V., Noskov A.A., Peskova O.V., Iagunova E.V. Avtomaticheskaia obrabotka tekstov na estestvennom iazyke i kompiuternaia lingvistika: uchebnoe posobie. M.: MIEM, 2011. 272 s.
6. Borodin D.S., Stroganov Iu.V., Volkova L.L., Rudakov I.V., Prosukov E.A. Transliator zaprosov na ogranichennom estestvennom iazyke v zaprosy k reliatsionnym bazam dannykh // Sistemnyi administrator. 2019. Vypusk №01-02. S. 194–195.
7. Posevkin R.V. Primenenie semanticheskoi modeli bazy dannykh pri realizatsii estestvenno-iazykovogo polzovatelskogo interfeisa // Nauchno-tekhnicheskii vestnik informatsionnykh tekhnologii, mekhaniki i optiki. 2018. Tom 18. № 2. S. 262–267.
8. Mikolov T., et al. Distributed representations of words and phrases and their compositionality // Proc. 26th Int. Conf. on Neural Information Processing Systems. 2013. P. 3111–3119.
9. Pennington J., et al. Glove: Global vectors for word representation // Proc. Conf. Empirical Methods in Natural Language Processing. 2014. P. 1532–1543.
10. Kenton J.D.M.-W. C., Toutanova L.K. Bert: Pre-training of deep bidirectional transformers for language understanding // Proc. Conf. of North American Chapter of Association for Computational Linguistics. 2019. P. 4171–4186.
11. Hafsa Shareef Dar, M. Ikramullah Lali, Khalid Mahmood Malik, Syed Ahmad Chan Bukhari. Frameworks for Querying Databases Using Natural Language: A Literature Review. 2019. P. 1–18. arXiv preprint. URL: https://arxiv.org/abs/1909.01822
12. Baig Muhammad Shahzaib, et al. Natural Language to SQL Queries: A Review Original Article // International Journal of Innovations in Science & Technology. 2022. Vol. 4. Issue 1. P. 147–162.
13. Tao Yu, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. arXiv preprint. 2018. URL: https://arxiv.org/abs/1809.08887
14. Manning C.D. Human language understanding & reasoning // Daedalus 2022. Vol. 151. Issue 2. P. 127–138.
15. Meyer Jesse G., et al. ChatGPT and large language models in academia: opportunities and challenges // BioData Mining 2023. Vol. 16. Art. numb. 20.
16. Microsoft Copilot в Azure с базой данных SQL Azure. URL: https://learn.microsoft.com/ru-ru/azure/azure-sql/copilot/copilot-azure-sql-overview?view=azuresql
17. MongoDB Query Generator using OpenAI. URL: https://www.mongodb.com/docs/compass/current/query-with-natural-language/#std-label-compass-query-natural-language
18. Lower your Large Language Model costs with Graphwise GraphDB. URL: https://www.ontotext.com/blog/lower-your-llm-costs-with-graphwise-graphdb/
19. AllegroGraph 8.4.0 LLM Embed Specification. URL: https://franz.com/agraph/support/documentation/llmembed.html
20. Stardog Voicebox FAQ: How LLM, Generative AI, and Knowledge Graphs are the Future of Data Management. URL: https://www.stardog.com/blog/stardog-voicebox-faq-how-llm-generative-ai-and-knowledge-graphs-are-the-future-of-data-management/
21. Trakhtengerts M.S. Tekhnologiia podgotovki informatsii dlia baz dannykh v obmennom formate ISO 2709 // Nauchno-tekhnicheskaia informatsiia. Ser. 2. 2006. № 7. S. 28–31.


Most read articles by the same author(s)