Методы автоматического присвоения кодов УДК математическим статьям: оценка классических и нейросетевых подходов

Bulat Timurovich Gizatullin; Olga Avenirovna Nevzorova

doi:10.26907/1562-5419-2026-29-3-699-718

PDF (Русский)

Published: 16.06.2026

UDC 004.912

DOI: https://doi.org/10.26907/1562-5419-2026-29-3-699-718

Issue

Vol. 29 No. 3 (2026)

Bulat Timurovich Gizatullin

Kazan (Volga region) Federal University, Kazan, Russia

https://orcid.org/0009-0000-6251-9260

Olga Avenirovna Nevzorova

Kazan (Volga region) Federal University, Kazan, Russia

https://orcid.org/0000-0001-8116-9446

Abstract

Universal Decimal Classification (UDC) is a hierarchical indexing system in which a publication may be assigned one or several codes. Manual UDC indexing is labor-intensive and often inconsistent. This paper addresses the automatic assignment of UDC codes to Russian-language mathematical research articles. The aim is to compare combinations of text representations and classification models on a unified corpus and to identify the most effective configurations. A corpus of 4194 articles was collected from Math-Net.Ru, including full texts, abstracts, metadata, and UDC codes. The preprocessing pipeline comprised PDF text extraction, removal of layout artifacts, and normalization of UDC labels. We compared TF-IDF, Word2Vec, SciRus-tiny, and SciRus-tiny3.5 representations combined with logistic regression, Complement Naive Bayes (CNB), and CatBoost. In both the single-label and multi-label settings, the best performance was achieved by TF-IDF + LogReg, while TF-IDF + CNB showed closely competitive results. The proposed approach can be used in automatic subject indexing systems for digital libraries and scientific archives, in UDC recommendation tools for authors and editors, and in metadata quality control workflows.

Keywords:

automatic classification, Universal Decimal Classification, UDC, scientific text processing, machine learning, hierarchical classification, multi-label classification, mathematical texts, digital libraries, text vectorization.

How to Cite

Gizatullin, B. T., and O. A. Nevzorova. “Methods for Automatic Assignment of UDC Codes to Mathematical Articles: An Evaluation of Classical and Neural Approaches”. Russian Digital Libraries Journal, vol. 29, no. 3, June 2026, pp. 699-18, doi:10.26907/1562-5419-2026-29-3-699-718.

References

1. Tóth E. Innovative Solutions in Automatic Classification: A Brief Summary // Libri. 2002. Vol. 52, No. 1. P. 48–53. https://doi.org/10.1515/LIBR.2002.48
2. Romanov A., Lomotin K., Kozlova E. Automatization of Scientific Articles Classification According to Universal Decimal Classifier // Supplementary Proceedings of the Sixth International Conference on Analysis of Images, Social Networks and Texts (AIST 2017). CEUR Workshop Proceedings. 2017. Vol. 1975. P. 122–133.
3. Romanov A.Yu., Lomotin K.E., Kozlova E.S., Kolesnichenko A.L. Research of neural networks application efficiency in automatic scientific articles classification according to UDC // Proceedings of the 2016 International Siberian Conference on Control and Communications (SIBCON 2016), Moscow, Russia, 12–14 May 2016. IEEE, 2016. P. 612–616. https://doi.org/10.1109/SIBCON.2016.7491783
4. Kragelj M., Kljajić Borštnar M. Automatic classification of older electronic texts into the Universal Decimal Classification-UDC // Journal of Documentation. 2021. Vol. 77, No. 3. P. 755–776. https://doi.org/10.1108/JD-06-2020-0092
5. Roy A., Ghosh S. Automated Subject Identification using the Universal Decimal Classification: The ANN Approach // SRELS Journal of Information and Knowledge. 2023. Vol. 60. No. 2. P. 69-76. https://doi.org/10.17821/srels/2023/v60i2/170963
6. Borovič M., Ojsteršek M., Strnad M. A Hybrid Approach to Recommending Universal Decimal Classification Codes for Cataloguing in Slovenian Digital Libraries // IEEE Access. 2022. Vol. 10, P. 85595–85605. https://doi.org/10.1109/ACCESS.2022.3198706
7. Mamedov V., Kovalevsky D., Morozov D., Stolyarov S., Ospichev S. Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example) // Modeling and Analysis of Information Systems. 2025. Vol. 32, No. 1. P. 80–94. https://doi.org/10.18255/1818-1015-2025-1-80-94
8. Borovič M., Tomovski E., Li Dobnik T., Majninger S. Evaluating Proprietary and Open-Weight Large Language Models as Universal Decimal Classification Recommender Systems // Applied Sciences. 2025. Vol. 15, No. 14. Art. 7666. https://doi.org/10.3390/app15147666
9. Silla C.N. Jr., Freitas A.A. A Survey of Hierarchical Classification across Different Application Domains // Data Mining and Knowledge Discovery. 2011. Vol. 22, No. 1–2. P. 31–72. https://doi.org/10.1007/s10618-010-0175-9
10. Zangari A., Marcuzzo M., Rizzo M., Giudice L., Albarelli A., Gasparetto A. Hierarchical Text Classification and Its Foundations: A Review of Current Research // Electronics. 2024. Vol. 13, No. 7. Art. 1199. https://doi.org/10.3390/electronics13071199
11. Liu R., Liang W., Luo W., Song Y., Zhang H., Xu R., Li Y., Liu M. Recent Advances in Hierarchical Multi-label Text Classification: A Survey // 2023.
arXiv:2307.16265. https://doi.org/10.48550/arXiv.2307.16265
12. Kowsari K., Jafari Meimandi K., Heidarysafa M., Mendu S., Barnes L.E., Brown D.E. Text Classification Algorithms: A Survey // Information. 2019. Vol. 10, No. 4. Art. 150. https://doi.org/10.3390/info10040150
13. Li Q., Peng H., Li J., Xia C., Yang R., Sun L., Yu P.S., He L. A Survey on Text Classification: From Traditional to Deep Learning // ACM Transactions on Intelligent Systems and Technology. 2022. Vol. 13, No. 2. Art. 31. P. 1–41.https://doi.org/10.1145/3495162
14. Mirończuk M.M., Protasiewicz J. A Recent Overview of the State-of-the-Art Elements of Text Classification // Expert Systems with Applications. 2018. Vol. 106. P. 36–54. https://doi.org/10.1016/j.eswa.2018.03.058
15. Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // 2013. arXiv:1301.3781. https://doi.org/10.48550/arXiv.1301.3781
16. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proceedings of NAACL-HLT 2019. Minneapolis, Minnesota, 2019. P. 4171–4186. https://doi.org/10.18653/v1/N19-1423
17. Gerasimenko N., Vatolin A., Ianina A., Vorontsov K. SciRus: Tiny and Powerful Multilingual Encoder for Scientific Texts // Doklady Mathematics. 2024. Vol. 110, Suppl. 1. P. S193–S202. https://doi.org/10.1134/S1064562424602178
18. Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V., Gulin A. CatBoost: unbiased boosting with categorical features // Advances in Neural Information Processing Systems. 2018. Vol. 31. P. 6638–6648. https://doi.org/10.48550/arXiv.1706.09516
19. Akiba T., Sano S., Yanase T., Ohta T., Koyama M. Optuna: A Next-generation Hyperparameter Optimization Framework // Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2019. P. 2623–2631. https://doi.org/10.1145/3292500.3330701
20. van der Maaten L., Hinton G. Visualizing Data using t-SNE // Journal of Machine Learning Research. 2008. Vol. 9, No. 86. P. 2579–2605.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.

All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.

Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.

By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.

RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.

Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.

The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.

We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.

Methods for Automatic Assignment of UDC Codes to Mathematical Articles: an Evaluation of Classical and Neural Approaches

Abstract

Keywords:

References

Most read articles by the same author(s)

Article Sidebar

Main Article Content

Abstract

Keywords:

Article Details

References

Most read articles by the same author(s)