Queries to Non-Relational Data using Natural Language based on a Large Language Model
Main Article Content
Abstract
The main purpose of this work is to explore new opportunities for organizing natural language queries in scientific local databases that are not relational. A brief review of recent research shows that there has been an active introduction of natural language queries into databases of various types, and the use of machine learning methods, such as neural algorithms, is noted. The widespread use of large language models in the last two years for query generation in various language settings and fields of expertise has been demonstrated. A study has been conducted to explore the potential of the AllegroGraph graph database in using large language models for natural language search. The functionality of the database has been examined using the example of a metadata system for thermophysical properties in the form of the "Thermal" domain ontology. Testing search queries in a bilingual (English and Russian) database environment has revealed some general problems that can be overcome, and it gives us good hope for the future application of new services using large language models.
Article Details
References
2. Erkimbaev A.O., Zitserman V.Iu., Kobzev G.A., Kosinov A.V. O predstavlenii i otsenke nauchnykh dannykh chislovogo i nechislovogo tipa pri provedenii issledovanii po svoistvam materialov // Nauchno-tekhnicheskaia informatsiia. Ser. 2. 2023. № 2. S. 8–16.
3. Woods W.A. Semantics and quantification in natural language question answering. // Advances in computers. N.Y. etc.: Acad. Press, 1978. Vol. 1 7. P. 1–87. URL: https://web.stanford.edu/class/linguist289/woods.pdf
4. Borodin D.S., Stroganov Iu.V. K zadache sostavleniia zaprosov k bazam dannykh na estestvennom iazyke // Novye informatsionnye tekhnologii v avtomatizirovannykh sistemakh: materialy 19 nauchno-prakticheskogo seminara. M.: IPM im. M.V. Keldysha, aprel 2016. P. 119–125.
5. Bolshakova E.I., Klyshinskii E. S., Lande D.V., Noskov A.A., Peskova O.V., Iagunova E.V. Avtomaticheskaia obrabotka tekstov na estestvennom iazyke i kompiuternaia lingvistika: uchebnoe posobie. M.: MIEM, 2011. 272 s.
6. Borodin D.S., Stroganov Iu.V., Volkova L.L., Rudakov I.V., Prosukov E.A. Transliator zaprosov na ogranichennom estestvennom iazyke v zaprosy k reliatsionnym bazam dannykh // Sistemnyi administrator. 2019. Vypusk №01-02. S. 194–195.
7. Posevkin R.V. Primenenie semanticheskoi modeli bazy dannykh pri realizatsii estestvenno-iazykovogo polzovatelskogo interfeisa // Nauchno-tekhnicheskii vestnik informatsionnykh tekhnologii, mekhaniki i optiki. 2018. Tom 18. № 2. S. 262–267.
8. Mikolov T., et al. Distributed representations of words and phrases and their compositionality // Proc. 26th Int. Conf. on Neural Information Processing Systems. 2013. P. 3111–3119.
9. Pennington J., et al. Glove: Global vectors for word representation // Proc. Conf. Empirical Methods in Natural Language Processing. 2014. P. 1532–1543.
10. Kenton J.D.M.-W. C., Toutanova L.K. Bert: Pre-training of deep bidirectional transformers for language understanding // Proc. Conf. of North American Chapter of Association for Computational Linguistics. 2019. P. 4171–4186.
11. Hafsa Shareef Dar, M. Ikramullah Lali, Khalid Mahmood Malik, Syed Ahmad Chan Bukhari. Frameworks for Querying Databases Using Natural Language: A Literature Review. 2019. P. 1–18. arXiv preprint. URL: https://arxiv.org/abs/1909.01822
12. Baig Muhammad Shahzaib, et al. Natural Language to SQL Queries: A Review Original Article // International Journal of Innovations in Science & Technology. 2022. Vol. 4. Issue 1. P. 147–162.
13. Tao Yu, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. arXiv preprint. 2018. URL: https://arxiv.org/abs/1809.08887
14. Manning C.D. Human language understanding & reasoning // Daedalus 2022. Vol. 151. Issue 2. P. 127–138.
15. Meyer Jesse G., et al. ChatGPT and large language models in academia: opportunities and challenges // BioData Mining 2023. Vol. 16. Art. numb. 20.
16. Microsoft Copilot в Azure с базой данных SQL Azure. URL: https://learn.microsoft.com/ru-ru/azure/azure-sql/copilot/copilot-azure-sql-overview?view=azuresql
17. MongoDB Query Generator using OpenAI. URL: https://www.mongodb.com/docs/compass/current/query-with-natural-language/#std-label-compass-query-natural-language
18. Lower your Large Language Model costs with Graphwise GraphDB. URL: https://www.ontotext.com/blog/lower-your-llm-costs-with-graphwise-graphdb/
19. AllegroGraph 8.4.0 LLM Embed Specification. URL: https://franz.com/agraph/support/documentation/llmembed.html
20. Stardog Voicebox FAQ: How LLM, Generative AI, and Knowledge Graphs are the Future of Data Management. URL: https://www.stardog.com/blog/stardog-voicebox-faq-how-llm-generative-ai-and-knowledge-graphs-are-the-future-of-data-management/
21. Trakhtengerts M.S. Tekhnologiia podgotovki informatsii dlia baz dannykh v obmennom formate ISO 2709 // Nauchno-tekhnicheskaia informatsiia. Ser. 2. 2006. № 7. S. 28–31.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.
All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.
Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.
By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.
RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.
Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.
The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.
We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.