Автоматические и полуавтоматические методы построения графа знаний предметной области и расширения онтологии

Andrey Petrovich Khalov; Olga Muratovna Ataeva

doi:10.26907/1562-5419-2025-28-6-1481-1519

PDF (Русский)

Published: 18.12.2025

UDC 004

DOI: https://doi.org/10.26907/1562-5419-2025-28-6-1481-1519

Issue

Vol. 28 No. 6 (2025): Special issue "Actual tasks in semantic analysis"

Andrey Petrovich Khalov

Moscow Institute of Physics and Technology, Dolgoprudny, Russia

https://orcid.org/0009-0005-4584-8245

Olga Muratovna Ataeva

Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow, Russia

https://orcid.org/0000-0003-0367-5575

Abstract

We present a combined pipeline for knowledge-graph construction and ontology expansion. The approach builds a BIO-tagged corpus via fully automatic LLM-based pseudo-annotation and introduces dedicated UNK reserve categories to capture previously unseen classes and relations. A specialized NER/RE model is trained on a 3-million-token dataset with 92 labels. The model exhibits a conservative quality profile – high precision with moderate recall – suited for safe graph enrichment: integrating the extracted facts expands the graph to ~0.98 million triples, while the expansion ratio (total inferred facts to explicit triples) increases from 2.65 to 3.52, with logical consistency preserved. UNK label pools are converted into stable synsets, enabling semiautomatic ontology expansion; 12 new classes derived from unstructured texts were added. We also demonstrate practical value for querying and analytics using an LLM + SPARQL setup.

Keywords:

ontology, DOLCE, knowledge graph, NER, BIO tagging, RDF/OWL, SPARQL.

How to Cite

Khalov, A. P., and O. M. Ataeva. “Automatic and Semi-Automatic Methods for Domain Knowledge-Graph Construction and Ontology Expansion ”. Russian Digital Libraries Journal, vol. 28, no. 6, Dec. 2025, pp. 1481-19, doi:10.26907/1562-5419-2025-28-6-1481-1519.

References

1. Borgo S. et al. DOLCE: A descriptive ontology for linguistic and cognitive engineering // Applied Ontology. 2023. Vol. 17, No. 1. Р. 45–69.
2. IT Service Management Ontology (ITSMO). Canonical resolver; catalog entry in LOV “IT Service Management Ontology (itsmo)”. https://w3id.org/itsmo; ontology.it; lov.linkeddata.es (Accessed: 08 August 2025).
3. Khalov A., Ataeva O. Automating Ontology Mapping in IT Service Management: A DOLCE and ITSMO Integration // Data Science Journal. 2025. Vol. 24. Р. 23. https://doi.org/10.5334/dsj-2025-023
4. Gruber T.R. A translation approach to portable ontology specifications // Knowledge Acquisition. 1993. Vol. 5, No. 2. Р. 199–220. https://doi.org/10.1006/knac.1993.1008
5. Gruber T.R. Toward principles for the design of ontologies used for knowledge sharing // International Journal of Human-Computer Studies. 1995. Vol. 43, No. 5–6. Р. 907–928. https://doi.org/10.1006/ijhc.1995.1081
6. Smith B. Ontology (Science) // Formal Ontology in Information Systems, IOS Press, 2008. Р. 21–35. https://doi.org/10.1038/npre.2008.2027.2
7. Studer R., Benjamins V.R., Fensel D. Knowledge Engineering: Principles and Methods // Data & Knowledge Engineering. 1998. Vol. 25, No. 1–2. Р. 161–197. https://doi.org/10.1016/S0169-023X(97)00056-6
8. Hogan A., Blomqvist E., Cochez M. et al. Knowledge Graphs. Morgan & Claypool Publishers, 2021. 257 p.
9. Barrasa J., Webber J. Building Knowledge Graphs: A Practitioner's Guide. O'Reilly Media, 2023. 250 p.
10. El Yamami A. et al. An ontological representation of ITIL framework service level management process // Proceedings of the 3rd International Conference on Signals, Distributed Systems and Artificial Intelligence (SDSAI 2018). 2019. Springer.
11. Valiente M.-C., Vicente-Chicote C., Rodriguez D. An Ontology-Based and Model-Driven Approach for Designing IT Service Management Systems // Int. J. of Service Science, Management, Engineering, and Technology. 2011. Vol. 2 (2). P. 65–81.
12. Miwa M., Bansal M. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016. Р. 1105–1116. Berlin, Germany: Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1105
13. Xu J., Zhang Z., Friedman T., Liang Y., Van den Broeck G. A Semantic Loss Function for Deep Learning with Symbolic Knowledge // Proceedings of the 35th International Conference on Machine Learning (ICML). PMLR, 2018. Vol. 80. Р. 5502–5511. URL: https://proceedings.mlr.press/v80/xu18h.html
14. Sun K., Zhang R., Mensah S., Mao Y., Liu X. Learning Implicit and Explicit Multi-task Interactions for Information Extraction // ACM Transactions on Information Systems. 2023. Vol. 41, No. 2. Р. 1–29. https://doi.org/10.1145/3533020
15. Giunchiglia E., Lukasiewicz T. Coherent Hierarchical Multi-label Classification Networks // Advances in Neural Information Processing Systems 33 (NeurIPS 2020). 2020. URL: https://proceedings.neurips.cc/paper/2020/file/ 6dd4e10e3296fa63738371ec0d5df818-Paper.pdf
16. Yu J., Bohnet B., Poesio M. Named Entity Recognition as Dependency Parsing // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). 2020. Р. 6470–6476. https://doi.org/10.18653/v1/2020.acl-main.577
17. Lu Y., Liu Q., Dai D., Xiao X., Lin H., Han X., Sun L., Wu H. Unified Structure Generation for Universal Information Extraction // Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. Р. 5755–5772. https://doi.org/10.18653/v1/2022.acl-long.395
18. Gururangan S., Marasović A., Swayamdipta S., Lo K., Beltagy I., Downey D., Smith N. A. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). 2020. Р. 8342–8360. https://doi.org/10.18653/v1/2020.acl-main.740
19. Brown T.B. et al. Language Models are Few-Shot Learners // Advances in Neural Information Processing Systems. 2020. Vol. 33. URL: https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
20. Alizadeh M., Kubli M., Samei Z., Dehghani S., Zahedivafa M., Bermeo J.D., Korobeynikova M., Gilardi F. Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning // Journal of Computational Social Science. 2025. Vol. 8. Article 17. https://doi.org/10.1007/s42001-024-00345-9
21. Eiras F. et al. Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI // Proceedings of the 41st International Conference on Machine Learning (ICML 2024). Proceedings of Machine Learning Research. 2024. Vol. 235. Р. 12348–12370. URL: https://proceedings.mlr.press/v235/eiras24b.html
22. Tjong Kim Sang, E. F., De Meulder F. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: HLT-NAACL 2003 (CoNLL-2003).
23. Zhang B., May J., Nothman J., Knight K., Ji H. Cross-lingual Name Tagging and Linking for 282 Languages. ACL 2017.
24. Derczynski, L. et al. Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition. W-NUT 2017 (ACL Workshop).
25. Brown T.B. et al. Language Models are Few-Shot Learners // Advances in Neural Information Processing Systems. 2020. Vol. 33.
26. Campello R.J.G.B., Moulavi D., Sander J. Hierarchical density estimates for data clustering, visualization, and outlier detection // ACM Transactions on Knowledge Discovery from Data (TKDD). 2015. Vol. 10 (1). P. 5. https://doi.org/10.1145/2733381
27. Vardi Y., Zhang C.-H. A modified Weiszfeld algorithm for the Fermat–Weber location problem // Mathematical Programming. 2001. Vol. 90. Р. 559–566. https://doi.org/10.1007/PL00011435
28. Reimers N., Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. Р. 3982–3992. Hong Kong, China: Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410
29. Hugging Face. XLM-RoBERTa (large): specs (24 layers, ~550M params). 2020–2024. URL: https://huggingface.co/transformers/v3.4.0/pretrained_models.html
30. Côté M.-A. et al. TextWorld: A Learning Environment for Text-Based Games // Computer Games (CGW@IJCAI 2018). 2019. Vol. 1017 (CCIS). Р. 41–75. https://doi.org/10.1007/978-3-030-24337-1_3
31. Russell S., Norvig P. Artificial Intelligence: A Modern Approach. 4th ed. Pearson, 2020. Chapter 11: Planning and Acting.
32. Schmidhuber J. Gödel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements // Artificial General Intelligence. 2007. Р. 199–226. https://doi.org/10.1007/978-3-540-68677-4_7
33. Yin X. et al. Gödel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement // Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2025). 2025. Р. 27890–27913. https://aclanthology.org/2025.acl-long.1354/
34. Ataeva O.M., Serebryakov V.A. Ontology of the Digital Semantic Library LibMeta // Informatics and Its Applications. 2018. Vol. 12, No. 1. P. 2–10 (In Russian).

This work is licensed under a Creative Commons Attribution 4.0 International License.

Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.

All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.

Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.

By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.

RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.

Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.

The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.

We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.

Article Sidebar

Main Article Content

Abstract

Keywords:

Article Details

References

Most read articles by the same author(s)