Automatic and Semi-Automatic Methods for Domain Knowledge-Graph Construction and Ontology Expansion

Main Article Content

Andrey Petrovich Khalov
Olga Muratovna Ataeva

Abstract

We present a combined pipeline for knowledge-graph construction and ontology expansion. The approach builds a BIO-tagged corpus via fully automatic LLM-based pseudo-annotation and introduces dedicated UNK reserve categories to capture previously unseen classes and relations. A specialized NER/RE model is trained on a 3-million-token dataset with 92 labels. The model exhibits a conservative quality profile – high precision with moderate recall – suited for safe graph enrichment: integrating the extracted facts expands the graph to ~0.98 million triples, while the expansion ratio (total inferred facts to explicit triples) increases from 2.65 to 3.52, with logical consistency preserved. UNK label pools are converted into stable synsets, enabling semiautomatic ontology expansion; 12 new classes derived from unstructured texts were added. We also demonstrate practical value for querying and analytics using an LLM + SPARQL setup.

Article Details

How to Cite
Khalov, A. P., and O. M. Ataeva. “Automatic and Semi-Automatic Methods for Domain Knowledge-Graph Construction and Ontology Expansion ”. Russian Digital Libraries Journal, vol. 28, no. 6, Dec. 2025, pp. 1481-19, doi:10.26907/1562-5419-2025-28-6-1481-1519.

References

1. Borgo S. et al. DOLCE: A descriptive ontology for linguistic and cognitive engineering // Applied Ontology. 2023. Vol. 17, No. 1. Р. 45–69.
2. IT Service Management Ontology (ITSMO). Canonical resolver; catalog entry in LOV “IT Service Management Ontology (itsmo)”. https://w3id.org/itsmo; ontology.it; lov.linkeddata.es (Accessed: 08 August 2025).
3. Khalov A., Ataeva O. Automating Ontology Mapping in IT Service Management: A DOLCE and ITSMO Integration // Data Science Journal. 2025. Vol. 24. Р. 23. https://doi.org/10.5334/dsj-2025-023
4. Gruber T.R. A translation approach to portable ontology specifications // Knowledge Acquisition. 1993. Vol. 5, No. 2. Р. 199–220. https://doi.org/10.1006/knac.1993.1008
5. Gruber T.R. Toward principles for the design of ontologies used for knowledge sharing // International Journal of Human-Computer Studies. 1995. Vol. 43, No. 5–6. Р. 907–928. https://doi.org/10.1006/ijhc.1995.1081
6. Smith B. Ontology (Science) // Formal Ontology in Information Systems, IOS Press, 2008. Р. 21–35. https://doi.org/10.1038/npre.2008.2027.2
7. Studer R., Benjamins V.R., Fensel D. Knowledge Engineering: Principles and Methods // Data & Knowledge Engineering. 1998. Vol. 25, No. 1–2. Р. 161–197. https://doi.org/10.1016/S0169-023X(97)00056-6
8. Hogan A., Blomqvist E., Cochez M. et al. Knowledge Graphs. Morgan & Claypool Publishers, 2021. 257 p.
9. Barrasa J., Webber J. Building Knowledge Graphs: A Practitioner's Guide. O'Reilly Media, 2023. 250 p.
10. El Yamami A. et al. An ontological representation of ITIL framework service level management process // Proceedings of the 3rd International Conference on Signals, Distributed Systems and Artificial Intelligence (SDSAI 2018). 2019. Springer.
11. Valiente M.-C., Vicente-Chicote C., Rodriguez D. An Ontology-Based and Model-Driven Approach for Designing IT Service Management Systems // Int. J. of Service Science, Management, Engineering, and Technology. 2011. Vol. 2 (2). P. 65–81.
12. Miwa M., Bansal M. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016. Р. 1105–1116. Berlin, Germany: Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1105
13. Xu J., Zhang Z., Friedman T., Liang Y., Van den Broeck G. A Semantic Loss Function for Deep Learning with Symbolic Knowledge // Proceedings of the 35th International Conference on Machine Learning (ICML). PMLR, 2018. Vol. 80. Р. 5502–5511. URL: https://proceedings.mlr.press/v80/xu18h.html
14. Sun K., Zhang R., Mensah S., Mao Y., Liu X. Learning Implicit and Explicit Multi-task Interactions for Information Extraction // ACM Transactions on Information Systems. 2023. Vol. 41, No. 2. Р. 1–29. https://doi.org/10.1145/3533020
15. Giunchiglia E., Lukasiewicz T. Coherent Hierarchical Multi-label Classification Networks // Advances in Neural Information Processing Systems 33 (NeurIPS 2020). 2020. URL: https://proceedings.neurips.cc/paper/2020/file/ 6dd4e10e3296fa63738371ec0d5df818-Paper.pdf
16. Yu J., Bohnet B., Poesio M. Named Entity Recognition as Dependency Parsing // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). 2020. Р. 6470–6476. https://doi.org/10.18653/v1/2020.acl-main.577
17. Lu Y., Liu Q., Dai D., Xiao X., Lin H., Han X., Sun L., Wu H. Unified Structure Generation for Universal Information Extraction // Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. Р. 5755–5772. https://doi.org/10.18653/v1/2022.acl-long.395
18. Gururangan S., Marasović A., Swayamdipta S., Lo K., Beltagy I., Downey D., Smith N. A. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). 2020. Р. 8342–8360. https://doi.org/10.18653/v1/2020.acl-main.740
19. Brown T.B. et al. Language Models are Few-Shot Learners // Advances in Neural Information Processing Systems. 2020. Vol. 33. URL: https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
20. Alizadeh M., Kubli M., Samei Z., Dehghani S., Zahedivafa M., Bermeo J.D., Korobeynikova M., Gilardi F. Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning // Journal of Computational Social Science. 2025. Vol. 8. Article 17. https://doi.org/10.1007/s42001-024-00345-9
21. Eiras F. et al. Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI // Proceedings of the 41st International Conference on Machine Learning (ICML 2024). Proceedings of Machine Learning Research. 2024. Vol. 235. Р. 12348–12370. URL: https://proceedings.mlr.press/v235/eiras24b.html
22. Tjong Kim Sang, E. F., De Meulder F. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: HLT-NAACL 2003 (CoNLL-2003).
23. Zhang B., May J., Nothman J., Knight K., Ji H. Cross-lingual Name Tagging and Linking for 282 Languages. ACL 2017.
24. Derczynski, L. et al. Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition. W-NUT 2017 (ACL Workshop).
25. Brown T.B. et al. Language Models are Few-Shot Learners // Advances in Neural Information Processing Systems. 2020. Vol. 33.
26. Campello R.J.G.B., Moulavi D., Sander J. Hierarchical density estimates for data clustering, visualization, and outlier detection // ACM Transactions on Knowledge Discovery from Data (TKDD). 2015. Vol. 10 (1). P. 5. https://doi.org/10.1145/2733381
27. Vardi Y., Zhang C.-H. A modified Weiszfeld algorithm for the Fermat–Weber location problem // Mathematical Programming. 2001. Vol. 90. Р. 559–566. https://doi.org/10.1007/PL00011435
28. Reimers N., Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. Р. 3982–3992. Hong Kong, China: Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410
29. Hugging Face. XLM-RoBERTa (large): specs (24 layers, ~550M params). 2020–2024. URL: https://huggingface.co/transformers/v3.4.0/pretrained_models.html
30. Côté M.-A. et al. TextWorld: A Learning Environment for Text-Based Games // Computer Games (CGW@IJCAI 2018). 2019. Vol. 1017 (CCIS). Р. 41–75. https://doi.org/10.1007/978-3-030-24337-1_3
31. Russell S., Norvig P. Artificial Intelligence: A Modern Approach. 4th ed. Pearson, 2020. Chapter 11: Planning and Acting.
32. Schmidhuber J. Gödel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements // Artificial General Intelligence. 2007. Р. 199–226. https://doi.org/10.1007/978-3-540-68677-4_7
33. Yin X. et al. Gödel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement // Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2025). 2025. Р. 27890–27913. https://aclanthology.org/2025.acl-long.1354/
34. Ataeva O.M., Serebryakov V.A. Ontology of the Digital Semantic Library LibMeta // Informatics and Its Applications. 2018. Vol. 12, No. 1. P. 2–10 (In Russian).


Most read articles by the same author(s)

1 2 > >>