Анализ моделей векторных представлений слов в задаче разметки семантических ролей в русскоязычных текстах

Leysan Maratovna Kadermyatova; Elena Victorovna Tutubalina

doi:10.26907/1562-5419-2020-23-5-1026-1043

PDF (Русский)

Published: 23.08.2020

UDC 004.41 004.02

DOI: https://doi.org/10.26907/1562-5419-2020-23-5-1026-1043

Issue

Vol. 23 No. 5 (2020)

Leysan Maratovna Kadermyatova

Higher Institute of Information Technology and Intelligent Systems, Kazan Federal University

Elena Victorovna Tutubalina

Higher Institute of Information Technology and Intelligent Systems, Kazan Federal University

Abstract

Currently, there are a huge number of works dedicated to semantic role labeling of English texts [1–3]. However, semantic role labeling of Russian texts was an unexplored area for many years due to the lack of train and test corpora. Semantic role labeling of Russian Texts was widely disseminated after the appearance of the FrameBank corpus [4]. In this approach, we analyzed the influence of the word embedding models on the quality of semantic role labeling of Russian texts. Micro- and macro- F1 scores on word2vec [5], fastText [6], ELMo [7] embedding models were calculated. The set of experiments have shown that fastText models averaged slightly better than word2vec models as applied to Russian FrameBank corpus. The higher micro- and macro- F1 scores were obtained on deep tokenized word representation model ELMo in relation to classical shallow embedding models.

Keywords:

machine learning, ML-model, natural language processing, word embedding, semantic role labeling.

How to Cite

Kadermyatova, L. M., and E. V. Tutubalina. “Analysis of Word Embeddings for Semantic Role Labeling of Russian Texts”. Russian Digital Libraries Journal, vol. 23, no. 5, Aug. 2020, pp. 1026-43, doi:10.26907/1562-5419-2020-23-5-1026-1043.

Author Biographies

Leysan Maratovna Kadermyatova

Postgraduate student of the Higher School of Information Technologies and Intelligent Systems at Kazan Federal University, QA Engineer.

Elena Victorovna Tutubalina

Candidate of physico-mathematical sciences, senior researcher of the Higher School of Information Technologies and Intelligent Systems at Kazan Federal University. Research interests include natural language processing, machine learning, medical informatics.

References

Christensen J., Mausam, Soderland S., and Etzioni O. (2011), An analysis of openinformation extraction based on semantic role labeling. In Proceedings of thesixth international conference on Knowledge capture, pp. 113–120.

Sameer Pradhan, Wayne Ward, Kadri Hacioglu, James Martin, and Dan Jurafsky. 2005. Semantic role labeling using different syntactic views. In Proceedings of the Association for Computational Linguistics 43rd annual meeting (ACL-2005), Ann Arbor, MI.

Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 2017. Deep semantic role labeling: What works and whats next. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 473–483.

Olga Lyashevskaya and Egor Kashkin. 2015. Framebank: a database of russian lexical constructions. In International Conference on Analysis of Images, Social Networks and Texts, pages 350–360.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2227–2237.

Baker C. F., Fillmore C. J., and Lowe J. B. (1998), The Berkeley FrameNet project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics Volume 1, pp. 86–90.

Ilya Kuznetsov. 2016. Automatic semantic role labelling in Russian language, PhD thesis (in Russian). Ph.D. thesis, Higher School of Economics.

Shelmanov A., Smirnov I., Larionov D., Chistova E. Semantic Role Labeling with Pretrained Language Models for Known and Unknown Predicates // Proceedings of Recent Advances in Natural Language Processing, pages 619–628, Varna, Bulgaria, Sep 2–4, 2019.

Andrey Kutuzov and Elizaveta Kuzmenko, 2017. WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models, pages 155–161. Springer.

Khakhulin, Yuri Kuratov, Denis Kuznetsov, et al. 2018. Deeppavlov: Open-source library for dialoguesystems. In Proceedings of ACL 2018, System Demonstrations, pages 122–127.

Shelmanov A., Devyatkin D. Semantic role labeling with neural networks for texts in Russian // Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference "Dialogue" (2017). — Vol. 1. — 2017. — P. 245–256.

Agarap, A. F. 2018. Deep Learning using Rectified Linear Units (ReLU), Neural and Evolutionary Computing, Vol. 1.

Luheng He, Mike Lewis, and Luke Zettlemoyer. Question-answer driven semantic role labeling: Using natural language to annotate natural language. In Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP 2015), pages 643–653, 2015.

Wen Tau Yih, Matthew Richardson, Chris Meek, Ming Wei Chang, and Jina Suh. The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), pages 201–206, 2016.

Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2010. Semantic role labeling for open information extraction. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading. Association for Computational Linguistics, Los Angeles, California, pages 52–60.

GS Osipov, IV Smirnov, and IA Tikhomirov. 2010. Relational-situational method for text search and analysis and its applications. Scientiﬁc and Technical Information Processing, 37(6):432–437.

Liu, D., Gildea, D., 2010. Semantic role features for machine translation. Proc. 23rd Int. Conf. on Computational Linguistics, p.716–724.

Kashkin, E.V., Lyashevskaya, O.N.: Semantic roles and construction net in Russian FrameBank [Semanticheskie roli i set’ konstrukcij v sisteme FrameBank] (in Russian). In: Computational Linguistics and Intellectual Technologies. Proceedings of International Conference “Dialog”, vol. 12-1, pp. 297–311. RSUH, Moscow (2013)

Lyashevskaya O. N., Kashkin E. V. Evaluation of frame-semantic role labeling in a case-marking language // Papers from the Annual International Conference "Dialogue" (2014). — 2014. — P. 350–365.

Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.

All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.

Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.

By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.

RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.

Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.

The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.

We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.

Article Sidebar

Main Article Content

Abstract

Keywords:

Article Details

Leysan Maratovna Kadermyatova

Elena Victorovna Tutubalina

References

Most read articles by the same author(s)