Analysis of Word Embeddings for Semantic Role Labeling of Russian Texts

Main Article Content

Leysan Maratovna Kadermyatova
Elena Victorovna Tutubalina

Abstract

Currently, there are a huge number of works dedicated to semantic role labeling of English texts [1–3]. However, semantic role labeling of Russian texts was an unexplored area for many years due to the lack of train and test corpora. Semantic role labeling of Russian Texts was widely disseminated after the appearance of the FrameBank corpus [4]. In this approach, we analyzed the influence of the word embedding models on the quality of semantic role labeling of Russian texts. Micro- and macro- F1 scores on word2vec [5], fastText [6], ELMo [7] embedding models were calculated. The set of experiments have shown that fastText models averaged slightly better than word2vec models as applied to Russian FrameBank corpus. The higher micro- and macro- F1 scores were obtained on deep tokenized word representation model ELMo in relation to classical shallow embedding models.

Article Details

Author Biographies

Leysan Maratovna Kadermyatova

Postgraduate student of the Higher School of Information Technologies and Intelligent Systems at Kazan Federal University, QA Engineer.

Elena Victorovna Tutubalina

Candidate of physico-mathematical sciences, senior researcher of the Higher School of Information Technologies and Intelligent Systems at Kazan Federal University. Research interests include natural language processing, machine learning, medical informatics.

References

Christensen J., Mausam, Soderland S., and Etzioni O. (2011), An analysis of openinformation extraction based on semantic role labeling. In Proceedings of thesixth international conference on Knowledge capture, pp. 113–120.

Sameer Pradhan, Wayne Ward, Kadri Hacioglu, James Martin, and Dan Jurafsky. 2005. Semantic role labeling using different syntactic views. In Proceedings of the Association for Computational Linguistics 43rd annual meeting (ACL-2005), Ann Arbor, MI.

Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 2017. Deep semantic role labeling: What works and whats next. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 473–483.

Olga Lyashevskaya and Egor Kashkin. 2015. Framebank: a database of russian lexical constructions. In International Conference on Analysis of Images, Social Networks and Texts, pages 350–360.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.

Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2227–2237.

Baker C. F., Fillmore C. J., and Lowe J. B. (1998), The Berkeley FrameNet project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics Volume 1, pp. 86–90.

Ilya Kuznetsov. 2016. Automatic semantic role labelling in Russian language, PhD thesis (in Russian). Ph.D. thesis, Higher School of Economics.

Shelmanov A., Smirnov I., Larionov D., Chistova E. Semantic Role Labeling with Pretrained Language Models for Known and Unknown Predicates // Proceedings of Recent Advances in Natural Language Processing, pages 619–628, Varna, Bulgaria, Sep 2–4, 2019.

Andrey Kutuzov and Elizaveta Kuzmenko, 2017. WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models, pages 155–161. Springer.

Khakhulin, Yuri Kuratov, Denis Kuznetsov, et al. 2018. Deeppavlov: Open-source library for dialoguesystems. In Proceedings of ACL 2018, System Demonstrations, pages 122–127.

Shelmanov A., Devyatkin D. Semantic role labeling with neural networks for texts in Russian // Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference "Dialogue" (2017). — Vol. 1. — 2017. — P. 245–256.

Agarap, A. F. 2018. Deep Learning using Rectified Linear Units (ReLU), Neural and Evolutionary Computing, Vol. 1.

Luheng He, Mike Lewis, and Luke Zettlemoyer. Question-answer driven semantic role labeling: Using natural language to annotate natural language. In Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP 2015), pages 643–653, 2015.

Wen Tau Yih, Matthew Richardson, Chris Meek, Ming Wei Chang, and Jina Suh. The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), pages 201–206, 2016.

Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2010. Semantic role labeling for open information extraction. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading. Association for Computational Linguistics, Los Angeles, California, pages 52–60.

GS Osipov, IV Smirnov, and IA Tikhomirov. 2010. Relational-situational method for text search and analysis and its applications. Scientific and Technical Information Processing, 37(6):432–437.

Liu, D., Gildea, D., 2010. Semantic role features for machine translation. Proc. 23rd Int. Conf. on Computational Linguistics, p.716–724.

Kashkin, E.V., Lyashevskaya, O.N.: Semantic roles and construction net in Russian FrameBank [Semanticheskie roli i set’ konstrukcij v sisteme FrameBank] (in Russian). In: Computational Linguistics and Intellectual Technologies. Proceedings of International Conference “Dialog”, vol. 12-1, pp. 297–311. RSUH, Moscow (2013)

Lyashevskaya O. N., Kashkin E. V. Evaluation of frame-semantic role labeling in a case-marking language // Papers from the Annual International Conference "Dialogue" (2014). — 2014. — P. 350–365.



Most read articles by the same author(s)