Extraction Of Wikidata Knowledge For The Metadata Formation For Documents of Digital Mathematical Collections
Main Article Content
Abstract
Methods for creating digital mathematical collections that include unstructured sets of documents are presented. These sets contain materials from scientific conferences, as well as articles from the archives of mathematical journals of the "pre-digital" period.
Using the software tools of the metadata factory of the digital mathematical library Lobachevskii DML, a mandatory set of metadata for digital collection documents was formed. To refine and replenish the metadata sets, knowledge extraction methods from Wikidata were used.
To search Wikidata for information about digital collection documents and their authors, a system of SPARQL queries has been developed. A set of Wikidata entities is defined, which determine the features of the search, as well as the subsequent filtering of the results.
Methods for clarifying and supplementing the bibliographic references given in the articles are proposed. When forming the metadata of documents of retrocollections, a search was made in Wikidata for information about the years of life of the authors of articles, as well as URLs of web pages with information about articles and their authors. The results of the formation of several new digital collections of the Lobachevskii-DML digital library are presented.
Article Details
References
2. Carette J., Farmer W.M., Kohlhase M., Rabe F. Big Math and the One-Brain Barrier: The Tetrapod Model of Mathematical Knowledge // Math. Intelligencer. 2021. Vol. 43. P. 78–87 (2021). https://doi.org/10.1007/s00283-020-10006-0.
3. Елизаров А.М., Зуев Д.С., Липачёв Е.К. Управление жизненным циклом электронных публикаций в информационной системе научного журнала // Вестник Воронежского государственного университета. Серия: Системный анализ и информационные технологии. 2014. № 4. С. 81–88.
4. Binfield P. Novel Scholarly Journal Concepts // In: Bartling S., Friesike S. (Eds.) Opening Science. The Evolving Guide on How the Internet is Changing Research, Collaboration and Scholarly Publishing. Springer International Publishing, 2014. P. 155–163. https://doi.org/10.1007/978-3-319-00026-8_10.
5. Ataeva O., Kalenov N., Serebriakov V., Sotnikov A. Informational Infrastructure of the Common Digital Space of Scientific Knowledge // CEUR Workshop Proceedings. 2021. Vol. 2990. P. 1–10. URL: http://ceur-ws.org/Vol-2990/rpaper1.pdf, last accessed 2021/11/07.
6. Ion P.D.F. Mathematics and the World Wide Web // In: Carette J., Aspinall D., Lange C., Sojka P., Windsteiger W. (Eds.) Intelligent Computer Mathematics. CICM 2013. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg. 2013. Vol 7961. https://doi.org/10.1007/978-3-642-39320-4_15.
7. Ion P.D.F., Watt S.M. The Global Digital Mathematics Library and the International Mathematical Knowledge Trust // ICM 2017: Intelligent Computer Mathematics, 2017. Lecture Notes in Artificial Intelligence. 2017. V. 10383. P. 56–69. https://doi.org/10.1007/978-3-319-62075-6_5.
8. Developing a 21st Century Global Library for Mathematics Research. Washington: The National Academies Press, 2014. 142 p. https://doi.org/10.17226/18619.
9. Xie I., Matusiak K. Discover Digital Libraries: Theory and Practice. Elsevier Inc., 2016.
10. Born-digital. URL: https://en.wikipedia.org/wiki/Born-digital, last accessed 2021/11/07.
11. Author Guide – ScholarOne Manuscripts. Clarivate Analytics. 2019. P. 1–70. URL: https://clarivate.com/webofsciencegroup/wp-content/uploads/sites/2/ dlm_uploads/2019/10/ScholarOne-Manuscripts-Author-Guide.pdf, last accessed 2021/11/07.
12. Author tutorials. Writing a journal manuscript. Springer Nature Switzerland AG, 2021. URL: https://www.springernature.com/gp/authors/campaigns/writing-a-manuscript, last accessed 2021/11/07.
13. Gafurova P., Elizarov A., Lipachev E. Algorithms for Integration of Unstructured Mathematical Documents into the Common Digital Space of Scientific Knowledge // CEUR Workshop Proceedings.2021. Vol. 2990. P. 39–49. URL: http://ceur-ws.org/Vol-2990/rpaper4.pdf, last accessed 2021/11/07.
14. Биряльцев Е.В., Елизаров А.М., Жильцов Н.Г., Липачёв Е.К., Невзорова О.А., Соловьев В.Д. Методы анализа семантических данных математических электронных коллекций // Научно-техническая информация. Серия 2: Информационные процессы и системы. 2014. № 4. С. 12–17.
15. Tkaczyk D., Tarnawski B., Bolikowski Ł. Structured Affiliations Extraction from Scientific Literature // D-Lib Magazine. 2015. Vol. 21, No. 11/12. https://doi.org/10.1045/november2015-tkaczyk.
16. Elizarov A.M., Lipachev E.K., Khaydarov S.M. Automated system of services for processing of large collections of scientific documents // CEUR Workshop Proceedings. 2016. Vol. 1752. P. 58–64.
17. Tkaczyk D. New Methods for Metadata Extraction from Scientific Literature. arXiv:1710.10201v1. 2017. URL: https://arxiv.org/pdf/1710.10201v1.pdf, last accessed 2021/09/09.
18. Universal Decimal Classification. URL: https://udcc.org/index.php, last accessed 2021/09/09.
19. MSC2020–Mathematics Subject Classification System. URL: https://mathscinet.ams.org/msnhtml/msc2020.pdf, last accessed 2021/09/09.
20. Řehůřek R., Sojka P. Automated Classification and Categorization of Mathematical Knowledge // In: Autexier S., Campbell J., Rubio J., Sorge V., Suzuki M., Wiedijk F. (Eds.) Intelligent Computer Mathematics. CICM 2008. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg. 2008. Vol. 5144. P. 543–557. https://doi.org/10.1007/978-3-540-85110-3_44.
21. Хайдаров Ш.М., Ямалутдинова Г.Ш. Рекомендательная система классификации физико-математических документов // Научный сервис в сети Интернет: труды XX Всероссийской научной конференции (17–22 сентября 2018 г., г. Новороссийск). М.: ИПМ им. М.В. Келдыша, 2018. С. 480–486. URL: https://doi.org/ 10.20948/abrau-2018-57. http://keldysh.ru/abrau/2018/theses/ 57.pdf.
22. Schubotz M., Scharpf P., Teschke O., Kühnemund A., Breitinger C., Gipp B. AutoMSC: Automatic Assignment of Mathematics Subject Classification Labels // In: Proceedings of the 13th Conference on Intelligent Computer Mathematics. 2020. arXiV:2005.12099v1. 25 May 2020.
23. Nevzorova O., Almukhametov D. Towards a Recommender System for the Choice of UDC Code for Mathematical Articles // CEUR Workshop Proceedings. 2021. Vol. 3036. P. 54–62. URL: http://ceur-ws.org/Vol-3036/paper04.pdf, last accessed 2021/11/07.
24. Rocha E.M., Rodrigues J.F. Disseminating and preserving mathematical knowledge // In: Borwein J.M., Rocha E.M., Rodrigues J.F. (Eds.). Communicating Mathematics in the Digital Era. A K Peters, Ltd., 2008. P. 3–21.
25. Elizarov A.M., Lipachev E.K., Zuev D.S. Digital Mathematical Libraries: Overview of Implementations and Content Management Services // CEUR Workshop Proceedings. 2017. Vol. 2022. P. 317–325.
26. Elizarov A.M., Lipachev E.K. Lobachevskii DML: Towards a Semantic Digital Mathematical Library of Kazan University // CEUR Workshop Proceedings. 2017. Vol. 2022. P. 326–333. URL: http://ceur-ws.org/Vol-2022/paper50.pdf, last accessed 2021/11/07.
27. Elizarov A.M., Lipachev E.K. Big Math Methods in Lobachevskii-DML Digital Library // CEUR Workshop Proсeedings. 2019. Vol. 2523. P. 59–72. URL: http://ceur-ws.org/Vol-2523/invited08.pdf, last accessed 2021/11/21.
28. Гафурова П.О., Елизаров А.М., Липачёв Е.К. Базовые сервисы фабрики метаданных цифровой математической библиотеки Lobachevskii-DML // Электронные библиотеки. 2020. Т. 23, №3. С. 336–381.
https://doi.org/10.26907/1562-5419-2020-23-3-336-381.
29. EuDML metadata schema specification (v2.0–final). https://initiative.eudml.org/eudml-metadata-schema-specification-v20-final, last accessed 2021/11/11.
30. Elizarov A., Lipachev E. Digital Libraries and the Common Digital Space of Mathematical Knowledge // CEUR Workshop Proceedings. 2021. Vol. 2990. P. 25–38. URL: http://ceur-ws.org/Vol-2990/rpaper3.pdf, last accessed 2021/11/07.
31. Электронная коллекция: Труды математического центра им. Н. И. Лобачевского. URL: https://lobachevskii-dml.ru/journal/tmt, last accessed 2021/11/07.
32. Электронная коллекция: «Известия физико-математического общества при Казанском университете». URL: https://lobachevskii-dml.ru/journal/izfmo2, https://lobachevskii-dml.ru/journal/izfmo3, last accessed 2021/11/07.
33. Elizarov A., Lipachev E. Digital Library Metadata Factories // CEUR Workshop Proceedings. 2021. Vol. 2813. P. 13–21. URL: http://ceur-ws.org/Vol-2813/rpaper01.pdf, last accessed 2021/11/07.
34. Elizarov A.M., Khaydarov Sh.M., Lipachev E.K. Scientific Documents Ontologies for Semantic Representation of Digital Libraries // In: Proceedings of the 2nd Russia and Pacific Conference on Computer Technology and Applications (RPC 2017). IEEE. 2017. P. 1–5. https://doi.org/10.1109/RPC.2017.8168064.
35. Elizarov A., Lipachev E. Methods of Processing Large Collections of Scientific Documents and the Formation of Digital Mathematical Library // CEUR Workshop Proceedings. 2020. V. 2543. P. 354–360. URL: http://ceur-ws.org/Vol-2543/spaper05.pdf, last accessed 2021/11/07.
36. Lane H., Hapke H., Howard C. Natural Language Processing in Action: Under-standing, analyzing, and generating text with Python. Manning Publications, 2019.
37. Natasha. URL: https://github.com/natasha/natasha, last accessed 2021/11/07.
38. Проект Natasha. Набор качественных открытых инструментов для обработки естественного русского языка (NLP). URL: https://habr.com/ru/post/516098/, last accessed 2021/11/07.
39. Bouche T., Rákosník J. Report on the EuDML External Cooperation Model // in: Kaiser K., Krantz S.G., Wegner B. (Eds.) Topics and Issues in Electronic Publishing, JMM, Special Session. San Diego, 2013. P. 99–10. URL: https://www.emis.de/proceedings/TIEP2013/07bouche_rakosnik.pdf, last accessed 2021/11/11.
40. Journal Article Tag Suite. URL: https://jats.nlm.nih.gov/about.html, last accessed 2021/01/05.
41. Gafurova P.O., Elizarov A.M., Lipachev E.K., Khammatova D.M. Metadata Normalization Methods in the Digital Mathematical Library // CEUR Workshop Proceedings. 2020. Vol. 2543. P. 136–148. URL: http://ceur-ws.org/Vol-2543/rpaper13.pdf, last accessed 2021/11/07.
42. Гафурова П.О., Елизаров А.М., Липачёв Е.К. Lobachevskii-DML: формирование архивных математических коллекций // Научный сервис в сети Интернет: труды XXII Всероссийской научной конференции. М.: ИПМ им. М.В. Келдыша, 2020. С. 171–183. https://doi.org/10.20948/abrau-2020-23.
43. Gafurova P.O., Elizarov A.M., Lipachev E.K. Metadata Extraction Methods for Organizing a Retro-Collection in the Lobachevskii Digital Mathematical Library // CEUR Workshop Proceedings. 2020. Vol. 2784. P. 62–71. URL: http://ceur-ws.org/Vol-2784/rpaper06.pdf, last accessed 2021/11/07.
44. Гафурова П.О., Елизаров А.М., Липачёв Е.К. Алгоритмы формирования метаданных математических ретро-коллекций на основе анализа структурных особенностей документов // Электронные библиотеки. 2021. Т. 24, №2. С. 238–271. https://doi.org/10.26907/1562-5419-2021-24-2-238-270.
45. Jost M., Bouche T., Goutorbe C., Jorda J.P. D3.2: The EuDML metadata schema. Revision: 1.6 as of 15th December 2010. URL: http://www.mathdoc.fr/publis/d3.2-v1.6.pdf, last accessed 2021/11/11.
46. Vrandečić D., Krötzsch M. Wikidata: a free collaborative knowledgebase // Communications of the ACM. October 2014. Vol. 57, Issue 10. P. 78–85. https://doi.org/10.1145/2629489.
47. Wikipedia: Wikidata. URL: https://en.wikipedia.org/wiki/Wikidata, last accessed 2021/11/07.
48. Statistics – Wikidata. URL: https://www.Wikidata.org/wiki/Special:Statistics, last accessed 2021/11/07.
49. Wikidata: Glossary. URL: https://www.Wikidata.org/wiki/Wikidata:Glossary, last accessed 2021/11/07.
50. Erxleben F., Günther M., Krötzsch M., Mendez J., Vrandečić D. Introducing Wikidata to the Linked Data Web // In: Mika P. et al. (Eds.) The Semantic Web – ISWC 2014. ISWC 2014. Lecture Notes in Computer Science. Springer, Cham. 2014. Vol. 8796. P. 50–65. https://doi.org/10.1007/978-3-319-11964-9_4.
51. Geiß J., Spitz A., Gertz M. NECKAr: A Named Entity Classifier for Wikidata // In: Rehm G., Declerck T. (Eds.) Language Technologies for the Challenges of the Digital Age. GSCL 2017. Lecture Notes in Computer Science. Springer, Cham. 2018. Vol 10713. P. 115–129. https://doi.org/10.1007/978-3-319-73706-5_10.
52. Scharpf Ph., Schubotz M., Gipp B. Mathematics in Wikidata // CEUR Workshop Proceedings. 2021. Vol. 2982. P. 1–14.
URL: http://ceur-ws.org/Vol-2982/paper-1.pdf, last accessed 2021/11/07.
53. Knoblock C.A., Szekely P. A scalable architecture for extracting, aligning, link-ing, and visualizing multi-Int data // Proc. SPIE 9499, Next-Generation Analyst III, 949907 (15 May 2015). https://doi.org/10.1117/12.2177119.
54. Андреичев М.Д., Гафурова П.О., Елизаров А.М., Липачёв Е.К. Пополнение метаданных документов математических цифровых ретро-коллекций методом семантических сетей // Научный сервис в сети Интернет: труды XXIII Всероссийской научной конференции (20–23 сентября 2021 г., онлайн). М.: ИПМ им. М.В. Келдыша, 2021. С. 22–33. https://doi.org/10.20948/abrau-2021-22. URL: https://keldysh.ru/abrau/2021/theses/22.pdf, last accessed 2021/11/07.
55. Ayers P., Matthews C., Yates B. How Wikipedia Works: And How You Can Be a Part of It. No Starch Press, San Francisco, CA, 2008.
56. Wikipedia Documentation.
URL: https://wikipedia.readthedocs.io/en/latest/code.html, last accessed 2021/11/07.
57. Pywikibot Documentation. URL: https://doc.wikimedia.org/pywikibot/master/index.html, last accessed 2021/11/07.
58. SPARQL Query Language for RDF/W3C. URL: https://www.w3.org/TR/rdf-sparql-query/. last accessed 2021/11/07.
59. MediaWiki is a collaboration and documentation platform brought to you by a vibrant community. URL: https://www.mediawiki.org/wiki/MediaWiki, last ac-cessed 2021/11/07.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.
All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.
Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.
By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.
RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.
Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.
The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.
We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.