Methods for Automatic Assignment of UDC Codes to Mathematical Articles: an Evaluation of Classical and Neural Approaches

Main Article Content

Bulat Timurovich Gizatullin
Olga Avenirovna Nevzorova

Abstract

Universal Decimal Classification (UDC) is a hierarchical indexing system in which a publication may be assigned one or several codes. Manual UDC indexing is labor-intensive and often inconsistent. This paper addresses the automatic assignment of UDC codes to Russian-language mathematical research articles. The aim is to compare combinations of text representations and classification models on a unified corpus and to identify the most effective configurations. A corpus of 4194 articles was collected from Math-Net.Ru, including full texts, abstracts, metadata, and UDC codes. The preprocessing pipeline comprised PDF text extraction, removal of layout artifacts, and normalization of UDC labels. We compared TF-IDF, Word2Vec, SciRus-tiny, and SciRus-tiny3.5 representations combined with logistic regression, Complement Naive Bayes (CNB), and CatBoost. In both the single-label and multi-label settings, the best performance was achieved by TF-IDF + LogReg, while TF-IDF + CNB showed closely competitive results. The proposed approach can be used in automatic subject indexing systems for digital libraries and scientific archives, in UDC recommendation tools for authors and editors, and in metadata quality control workflows.

Article Details

How to Cite
Gizatullin, B. T., and O. A. Nevzorova. “Methods for Automatic Assignment of UDC Codes to Mathematical Articles: An Evaluation of Classical and Neural Approaches”. Russian Digital Libraries Journal, vol. 29, no. 3, June 2026, pp. 699-18, doi:10.26907/1562-5419-2026-29-3-699-718.

References

1. Tóth E. Innovative Solutions in Automatic Classification: A Brief Summary // Libri. 2002. Vol. 52, No. 1. P. 48–53. https://doi.org/10.1515/LIBR.2002.48
2. Romanov A., Lomotin K., Kozlova E. Automatization of Scientific Articles Classification According to Universal Decimal Classifier // Supplementary Proceedings of the Sixth International Conference on Analysis of Images, Social Networks and Texts (AIST 2017). CEUR Workshop Proceedings. 2017. Vol. 1975. P. 122–133.
3. Romanov A.Yu., Lomotin K.E., Kozlova E.S., Kolesnichenko A.L. Research of neural networks application efficiency in automatic scientific articles classification according to UDC // Proceedings of the 2016 International Siberian Conference on Control and Communications (SIBCON 2016), Moscow, Russia, 12–14 May 2016. IEEE, 2016. P. 612–616. https://doi.org/10.1109/SIBCON.2016.7491783
4. Kragelj M., Kljajić Borštnar M. Automatic classification of older electronic texts into the Universal Decimal Classification-UDC // Journal of Documentation. 2021. Vol. 77, No. 3. P. 755–776. https://doi.org/10.1108/JD-06-2020-0092
5. Roy A., Ghosh S. Automated Subject Identification using the Universal Decimal Classification: The ANN Approach // SRELS Journal of Information and Knowledge. 2023. Vol. 60. No. 2. P. 69-76. https://doi.org/10.17821/srels/2023/v60i2/170963
6. Borovič M., Ojsteršek M., Strnad M. A Hybrid Approach to Recommending Universal Decimal Classification Codes for Cataloguing in Slovenian Digital Libraries // IEEE Access. 2022. Vol. 10, P. 85595–85605. https://doi.org/10.1109/ACCESS.2022.3198706
7. Mamedov V., Kovalevsky D., Morozov D., Stolyarov S., Ospichev S. Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example) // Modeling and Analysis of Information Systems. 2025. Vol. 32, No. 1. P. 80–94. https://doi.org/10.18255/1818-1015-2025-1-80-94
8. Borovič M., Tomovski E., Li Dobnik T., Majninger S. Evaluating Proprietary and Open-Weight Large Language Models as Universal Decimal Classification Recommender Systems // Applied Sciences. 2025. Vol. 15, No. 14. Art. 7666. https://doi.org/10.3390/app15147666
9. Silla C.N. Jr., Freitas A.A. A Survey of Hierarchical Classification across Different Application Domains // Data Mining and Knowledge Discovery. 2011. Vol. 22, No. 1–2. P. 31–72. https://doi.org/10.1007/s10618-010-0175-9
10. Zangari A., Marcuzzo M., Rizzo M., Giudice L., Albarelli A., Gasparetto A. Hierarchical Text Classification and Its Foundations: A Review of Current Research // Electronics. 2024. Vol. 13, No. 7. Art. 1199. https://doi.org/10.3390/electronics13071199
11. Liu R., Liang W., Luo W., Song Y., Zhang H., Xu R., Li Y., Liu M. Recent Advances in Hierarchical Multi-label Text Classification: A Survey // 2023.
arXiv:2307.16265. https://doi.org/10.48550/arXiv.2307.16265
12. Kowsari K., Jafari Meimandi K., Heidarysafa M., Mendu S., Barnes L.E., Brown D.E. Text Classification Algorithms: A Survey // Information. 2019. Vol. 10, No. 4. Art. 150. https://doi.org/10.3390/info10040150
13. Li Q., Peng H., Li J., Xia C., Yang R., Sun L., Yu P.S., He L. A Survey on Text Classification: From Traditional to Deep Learning // ACM Transactions on Intelligent Systems and Technology. 2022. Vol. 13, No. 2. Art. 31. P. 1–41.https://doi.org/10.1145/3495162
14. Mirończuk M.M., Protasiewicz J. A Recent Overview of the State-of-the-Art Elements of Text Classification // Expert Systems with Applications. 2018. Vol. 106. P. 36–54. https://doi.org/10.1016/j.eswa.2018.03.058
15. Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // 2013. arXiv:1301.3781. https://doi.org/10.48550/arXiv.1301.3781
16. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proceedings of NAACL-HLT 2019. Minneapolis, Minnesota, 2019. P. 4171–4186. https://doi.org/10.18653/v1/N19-1423
17. Gerasimenko N., Vatolin A., Ianina A., Vorontsov K. SciRus: Tiny and Powerful Multilingual Encoder for Scientific Texts // Doklady Mathematics. 2024. Vol. 110, Suppl. 1. P. S193–S202. https://doi.org/10.1134/S1064562424602178
18. Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V., Gulin A. CatBoost: unbiased boosting with categorical features // Advances in Neural Information Processing Systems. 2018. Vol. 31. P. 6638–6648. https://doi.org/10.48550/arXiv.1706.09516
19. Akiba T., Sano S., Yanase T., Ohta T., Koyama M. Optuna: A Next-generation Hyperparameter Optimization Framework // Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2019. P. 2623–2631. https://doi.org/10.1145/3292500.3330701
20. van der Maaten L., Hinton G. Visualizing Data using t-SNE // Journal of Machine Learning Research. 2008. Vol. 9, No. 86. P. 2579–2605.


Most read articles by the same author(s)