Comparative Analysis of Geological Texts using Large Language Models
Main Article Content
Abstract
The rapid increase in the volume of publications in various fields of geology makes it crucial to introduce methods for automated processing of scientific texts. Large language models based on neural networks represent one of the most promising approaches to solving this challenge. The recent breakthroughs in artificial intelligence have made such models indispensable tools for researchers. Our work on semantic search for publications using additionally trained language models and measuring the similarity between geological texts yielded good results. However, the models we used were unable to perform in-depth text analysis. A comparative analysis of modern architectures identified the DeepSeek R1 model as belonging to a class of systems with advanced logical inference abilities. This type of model represents a fundamentally new level of quality in text generation. Based on the chosen model, we have developed a web service that provides unique functionality for comparative analysis of up to 5 scientific articles. The service supports multilingual sources, allowing users to input text in English, Chinese, Russian, etc. It generates structured reports in Russian, highlighting key theses, contradictions, and patterns. The proposed approach has been tested on geological publications, and the results have been promising.
Article Details
References
https://en.wikipedia.org/wiki/Large_language_model?ysclid=mg7ip9ev9d289421479 (date of access 01.10.2025)
2. Patuk M.I., Naumova V.V. Artificial Intelligence Methods for Scientific Research in Geology // Russian Digital Libraries Journal. 2023. Vol. 26, No. 5. P. 673–696. (In Russ.). https://doi.org/10.26907/1562-5419-2023-26-5-673-696
3. Patuk M.I., Naumova V.V. Using Semantic Search to Select and Rank Geological Publications // Automatic Documentation and Mathematical Linguistics. 2024. Vol. 58, Suppl. 5. P. S294–S298. https://doi.org/10.3103/S0005105525700372
4. Patuk M.I., Naumova V.V., Eryomenko V.S. Digital repository "geologyscience.ru": open access to scientific publications on russian geology // Russian Digital Library Journal. 2020. Vol. 23, No. 6. P. 1324–1338 (in Russian).
5. Kilizhekov O.K., Tolstov A.V., Yakhin Sh.M., Zyryanov I.V. Diamond deposit of the Mir kimberlite pipe: main research stages, specific features and results of exploration // Russian Mining Industry. 2025. No. 1. P. 49–56 (In Russ.).
https://doi.org/10.30686/1609-9192-2025-1-49-56
6. Shigley J., Chapman J., Ellison R. Discovery and Mining of the Argyle Diamond Deposit, Australia // Gems and Gemology. 2001. Vol. 37. P. 26–41. https://doi.org/10.5741/GEMS.37.1.26
7. ChatGPT.
URL: https://en.wikipedia.org/wiki/ChatGPT?ysclid=mg7j88jx9q883735240 (date of access 01.10.2025)
8. Picazo-Sanchez P., Ortiz-Martin L. Analysing the impact of ChatGPT in research // Applied Intelligence. 2024. Vol. 54. P. 4172–4188.
https://doi.org/10.1007/s10489-024-05298-0
9. Islam I., Islam M.N. Exploring the opportunities and challenges of ChatGPT in academia // Discover Education. 2024. Vol. 3. Article no. 31. https://doi.org/10.1007/s44217-024-00114-w
10. Faiza Farhat F., Sohail Sh. S., Dag Øivind Madsen D.Ø. How trustworthy is ChatGPT? The case of bibliometric analyses // Cogent Engineering. 2023. Vol. 10. Article no. 2222988. https://doi.org/10.1080/23311916.2023.2222988
11. Zashikhina I.M. Scientific Article Writing: Will ChatGPT Help? Vysshee obrazovanie v Rossii // Higher Education in Russia. 2023. Vol. 32, no. 8. P. 24–47.
https://doi.org/10.31992/0869-3617-2023-32-8-9-24-47 (In Russ., abstract in Eng.)
12. Hallucination (artificial intelligence). URL: https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence) (date of access 01.10.2025)
13. Salvagno M., Taccone F.S., Gerli A.G. Can artificial intelligence help for scientific writing? // Critical Care. 2023. Vol. 27. Article no. 75.
https://doi.org/10.1186/s13054-023-04380-2
14. Ghorbanfekr H., Kerstens P.J., Dirix K. Classification of geological borehole descriptions using a domain adapted large language model // Applied Computing and Geosciences. 2025. Vol. 25. Article no. 100229.
15. LLM Leaderboard.
https://artificialanalysis.ai/leaderboards/models (date of access 01.10.2025)
16. T-lite. https://huggingface.co/t-tech/T-lite-it-1.0-Q8_0-GGUF (date of access 01.10.2025)
17. GigaChat. https://giga.chat/ (date of access 01.10.2025)
18. DeepSeek. https://www.deepseek.com/en (date of access 01.10.2025)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.
All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.
Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.
By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.
RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.
Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.
The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.
We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.