Algorithms for Formation of Metadata Mathematical Retro Collections Based on Analysis of Structural Features of Documents

Main Article Content

Abstract

The solutions of the main problems associated with the formation of digital mathematical collections from documents published in the pre-digital period are presented – such collections are designated in the work as retro collections. Algorithms for creating a meta description of retro collections based on the analysis of the structure of mathematical documents and the use of software tools for extracting metadata are given. The description of retro-collections formed using the developed algorithms and included in the metadata factory of the digital mathematical library Lobachevskii-DML is given. The schemes for the formation of metadata and methods for normalizing the extracted metadata in accordance with the schemes and requirements of the integrating mathematical libraries are indicated.

Article Details

References

1. Elizarov A.M., Lipachev E.K. Lobachevskii DML: Towards a Semantic Digital Mathematical Library of Kazan University // CEUR Workshop Proceedings. 2017. V. 2022. P. 326–333.
2. Елизаров А.М., Липачёв Е.К. Семантические методы и инструменты электронной математической библиотеки Lobachevskii-DML // Научный сервис в сети Интернет: труды XIX Всероссийской научной конференции (18–23 сентября 2017 г., г. Новороссийск). М.: ИПМ им. М.В. Келдыша, 2017. С. 130–136. https://doi.org/10.20948/abrau-2017-73. URL: http://keldysh.ru/ abrau/ 2017/73.pdf.
3. Elizarov A.M., Lipachev E.K. Big Math Methods in Lobachevskii-DML Digital Library // CEUR Workshop Proceedings. 2019. V. 2523. P. 59–72.
4. Developing a 21st Century Global Library for Mathematics Research // Washington: The National Academies Press, 2014. 142 p. doi:10.17226/18619.
5. Ion P. The Effort to Realize a Global Digital Mathematics Library // In: Greuel G.M., Koch T., Paule P., Sommese A. (Eds). Mathematical Software – ICMS 2016. ICMS 2016. Lecture Notes in Computer Science, Springer, Cham, 2016. V. 9725.
https://doi.org/10.1007/978-3-319-42432-3_59.
6. Ion P.D.F., Watt S.M. The Global Digital Mathematics Library and the International Mathematical Knowledge Trust // ICM 2017: Intelligent Computer Mathematics, 2017. Lecture Notes in Artificial Intelligence. 2017. V. 10383. P. 56–69. URL: https://doi.org/10.1007/978-3-319-62075-6_5.
7. Bouche T. Some Thoughts on the Near-Future Digital Mathematics Library. Towards a Digital Mathematics Library. Masaryk University, 2008. P. 3–15. URL: https://eudml.org/doc/221606, last accessed 2020/12/12.
8. Bouche T. Digital Mathematics Libraries: The Good, the Bad, the Ugly // Math. Comput. Sci. 2010. V. 3. P. 227–241. https://doi.org/10.1007/s11786-010-0029-2.
9. Bouche T. The Digital Mathematics Library as of 2014 // Notices Amer. Math. Soc 2014. V. 61 (9). P. 1085–1088.
10. EuDML metadata schema specification (v2.0–final), https://initiative.eudml.org/eudml-metadata-schema-specification-v20-final, last accessed 2020/12/12.
11. Bouche T., Rákosník J. Report on the EuDML External Cooperation Model // In: Kaiser K., Krantz S.G., Wegner B. (Eds) Topics and Issues in Electronic Publishing, JMM, Special Session. San Diego. 2013. P. 99–108. URL: https://www.emis.de/ proceedings/TIEP2013/07bouche_rakosnik.pdf, last accessed 2020/12/12.
12. Jost M., Bouche T., Goutorbe C., Jorda J.P. D3.2: The EuDML metadata schema. Revision: 1.6 as of 15th December 2010. URL: http://www.mathdoc.fr/ publis/d3.2-v1.6.pdf, last accessed 2020/12/12.
13. Гафурова П.О., Елизаров А.М., Липачёв Е.К., Хамматова Д.М. Методы формирования и нормализации метаданных в цифровой математической библиотеке // Научный сервис в сети Интернет: труды XXI Всероссийской научной конференции (23–28 сентября 2019 г., г. Новороссийск). М.: ИПМ им. М.В. Келдыша, 2019. С. 234–244. https://doi.org/10.20948/abrau-2019-28. http://keldysh.ru/abrau/2019/theses/ 28.pdf, last accessed 2020/12/12.
14. Gafurova P.O., Elizarov A.M., Lipachev E.K., Khammatovа D.M. Metadata Normalization Methods in the Digital Mathematical Library // CEUR Workshop Proсeedings. 2020. V. 2543. P. 136–148.
15. Zhizhchenko A.B., Izaak A.D. The information system Math-Net.Ru. Application of contemporary technologies in the scientific work of mathematicians // Russian Math. Surveys. 2007. V. 62 (5). P. 943–966. http://dx.doi.org/10.1070/ RM2007v062n05ABEH004455.
16. Zhizhchenko A.B., Izaak A.D. The information system Math-Net.Ru. Current state and prospects. The impact factors of Russian mathematics journals // Russian Math. Surveys. 2009. V. 64 (4). P. 775–784. http://dx.doi.org/10.1070/ RM2009v064n04ABEH004638.
17. Жижченко А.Б., Изаак А.Д. Информационная система Math-Net.Ru. Применение современных технологий в научной работе математика // Успехи математических наук. 2007. Т. 62, №5 (377). C. 107–132. URL: https://doi.org/10.4213/rm8147. URL: http://www.mathnet.ru/links/c59aff2f134382372f88aa415a76755f/rm8147.pdf.
18. Жижченко А.Б., Изаак А.Д. Информационная система Math-Net.Ru. Современное состояние и перспективы развития. Импакт-факторы российских математических журналов // Успехи математических наук. 2009. Т. 64, №4 (388). С. 195–204. URL: https://doi.org/10.4213/rm9312; http://www.mathnet.ru/links/e27ab619eaefe03fe79d663468ddd3a0/rm9312.pdf
19. Chebukov D.E., Izaak A.D., Misyurina O.G., Pupyrev Yu.A., Zhizhchenko A.B. Math-Net.Ru as a Digital Archive of the Russian Mathematical Knowledge from the XIX Century to Today. Intelligent Computer Mathematics // Lecture Notes in Computer Science. 2013. V. 7961. P. 344–348. https://doi.org/ 10.1007/978-3-642-39320-4_26.
20. Chebukov D.E., Izaak A.D., Misyurina O.G., Pupyrev Yu.A. Math-Net.Ru video library: Creating a collection of scientific talks // In: Greuel G.-M. (Ed.) et al., Mathematical software – ICMS 2016. 5th international conference, Berlin, Germany, July 11–14, 2016. Proceedings. Cham: Springer. Lecture Notes in Computer Science. 2016. V. 9725. P. 447–450. https://doi.org/10.1007/978-3-319-42432-3_57.
21. Гафурова П.О., Елизаров А.М., Липачёв Е.К. Базовые сервисы цифровой математической библиотеки Lobachevskii-DML // Электронные библиотеки. 2020. Т. 23 (3). С. 336–381. https://doi.org/10.26907/1562-5419-2020-23-3-336-381.
22. Elizarov A., Lipachev E. Digital Library Metadata Factories // Proceedings of the International Conference "Internet and Modern Society" (IMS-2020). CEUR Workshop Proceedings. 2021. V. 2813. P. 13–21.
23. Rocha E.M., Rodrigues J.F. Disseminating and preserving mathematical knowledge. In: Borwein J.M., Rocha E.M., Rodrigues J.F. (Eds.). Communicating Mathematics in the Digital Era. A K Peters, Ltd., 2008. P. 3–21.
24. Bouche T. Toward a Digital Mathematics Library? A French Pedestrian Overview. In: Borwein J.M., Rocha E.M., Rodrigues J.F. (Eds.). Communicating Mathematics in the Digital Era. A K Peters, Ltd., 2008. P. 47–73.
25. Schonfeld R. JSTOR a History. Princeton University Press, Princeton, 2003. 448 p.
26. Burns J., Brenner A., Kiser K., Krot M., Llewellyn C., and Snyder R. JSTOR – Data for Research // M. Agosti et al. (Eds.): ECDL 2009. Lecture Notes in Computer Science. 2009. V. 5714. P. 416–419.
27. Gallica: the Online Digital Library of the Bibliotheque nationale de France. Review Essay // Nineteenth-Century Music Review. 2014. V. 11 (2). P. 337–347. https://doi.org/10.1017/S1479409814000287.
28. Bouche T. The NUMDAM program. MSRI workshop, April 16th 2005, Berkeley, 2005. URL: https://www.msri.org/specials/dmlp/6-Bouche-numdam.pdf, last accessed 2020/12/12.
29. Bartošek M., Lhoták M., Rákosník J., Sojka P., and Šárfy M. The DML-CZ Project: Objectives and First Steps. In: Borwein J.M., Rocha E.M., Rodrigues J.F. (Eds.). Communicating Mathematics in the Digital Era. A K Peters, Ltd., 2008. P. 75–86.
30. Bartošek M., and Rákosník J. DML-CZ: The Experience of a Medium-Sized Digital Mathematics Library // Notices of the AMS. 2013. V. 60, No. 8. P. 1028–1033. http://dx.doi.org/10.1090/noti1031.
31. D7.4: Toolset for Image and Text Processing and Metadata Enhancements – Final Release. URL: https://wiki.eudml.eu/mediawiki/eudml/images/D7.4-v1.0.pdf, last accessed 2020/12/12.
32. Journal Article Tag Suite. https://jats.nlm.nih.gov/about.html, last accessed 2020/12/12.
33. Elizarov A.M., Lipachev E.K. Methods of Processing Large Collections of Scientific Documents and the Formation of Digital Mathematical Library // CEUR Workshop Proceedings. 2020. V. 2543. P. 354–360.
34. Nilsson M., Naeve A., Duval E., Johnston P., Massart D. Harmonization Methodology for Metadata Models.
https://hal.archives-ouvertes.fr/hal-00591548, last accessed 2020/12/12.
35. Elizarov A.M., Lipachev E.K., Haidarov S.M. Automated Processing Service System of Large Collections of Scientific Documents // CEUR Workshop Proceedings. 2016. V. 1752. P. 58–64.
36. Elizarov A.M., Khaydarov Sh.M., Lipachev E.R. Scientific documents ontologies for semantic representation of digital libraries // 2017 Second Russia and Pacific Conference on Computer Technology and Applications (RPC). Vladivostok, Russky Island, Russia 25-29 September, 2017. P. 1–5. https://doi.org/10.1109/RPC.2017.8168064.
37. Peroni S. Semantic Web Technologies and Legal Scholarly Publishing, Springer International Publishing, 2014. 304 p.
https://doi.org/10.1007/978-3-319-04777-5.
38. Constantin A., Peroni S., Pettifer S., Shotton D., Vitali F. The Document Components Ontology (DoCO) // Semantic Web. 2016. V. 7, No. 2. P. 167–181. https://doi.org/10.3233/SW-150177.
39. Ruiz-Iniesta A., and Corcho O. A review of ontologies for describing scholarly and scientific documents // CEUR Workshop Proceedings. 2014. V. 1155. P. 1–12. URL: http://ceur-ws.org/Vol-1155/paper-07.pdf, last accessed 2020/12/12.
40. Kogalovsky M.R., Parinov S.I. Scholarly Communication in a Semantically Enrichable Research Information System with Embedded Taxonomy of Scientific Relationships // In: Klinov P., Mouromtsev D. (Eds.) Knowledge Engineering and Semantic Web. Communications in Computer and Information Science, Springer, 2015. V. 518. P. 87–101.
https://doi.org/10.1007/978-3-319-24543-0_7.
41. Биряльцев Е.В., Елизаров А.М., Жильцов Н.Г., Липачёв Е.К., Невзорова О.А., Соловьев В.Д. Методы анализа семантических данных математических электронных коллекций // Научно-техническая информация. Серия 2: Информационные процессы и системы. 2014. № 4. С. 12–17.
42. Biryal'tsev E., Elizarov A., Zhil'tsov N., Lipachev E., Nevzorova O., Solov'ev V. Methods for Analyzing Semantic Data of Electronic Collections in Mathematics // Automatic Documentation and Mathematical Linguistics. 2014. V. 48. No. 2. P. 81–85.
43. Ronzano F., Saggion H. Dr. Inventor Framework: Extracting Structured Information from Scientific Publications // In: Japkowicz N., Matwin S. (Eds.) Discovery Science. Lecture Notes in Computer Science, Springer, Cham., 2015. V. 9356. https://doi.org/10.1007/978-3-319-24282-8_18.
44. Tkaczyk D., Tarnawski B. and Bolikowski Ł. Structured Affiliations Extraction from Scientific Literature // D-Lib Magazine. 2015. V. 21, No. 11/12. https://doi.org/10.1045/november2015-tkaczyk.


Most read articles by the same author(s)