Construction and Annotation of a Russian-Language News Corpus for Automated Detection of Political Manipulation
Main Article Content
Abstract
This paper addresses the challenge of developing specialized corpus resources for the automated analysis of political manipulation in Russian-language media discourse. Although semantic text analysis and computational discourse analysis have advanced substantially in recent years, most existing corpora and annotation schemes are designed for English-language data and do not adequately capture the linguistic and discursive characteristics of Russian-language news media. The objective of this study is to construct a specialized corpus of Russian-language news texts and to develop an annotation scheme tailored to the automated analysis of political manipulation, with explicit consideration of the linguistic and discursive features of the Russian-language media environment. The study introduces a corpus of sentence-level fragments extracted from Russian-language news texts published between 2010 and 2019, together with an annotation scheme for manipulative techniques. The scheme is based on an adaptation of established international classifications of manipulative strategies and is reduced to a limited set of interpretable techniques relevant to Russian-language news discourse. The proposed framework covers emotional, argumentative, and contextual forms of manipulative influence. The resulting corpus and annotation scheme provide an empirical foundation for the development and evaluation of automated methods for analyzing political manipulation in Russian-language news media and may also support further research in media and political discourse.
Article Details
References
2. Chong D., Druckman J.N. Framing theory // Annual Review of Political Science. 2007. Vol. 10. P. 103–126. https://doi.org/10.1146/annurev.polisci.10.072805.103054
3. Mejias U.A., Vokuev N.E. Disinformation and the media: The case of Russia and Ukraine // Media, Culture & Society. 2017. Vol. 39, No. 7. P. 1027–1042. https://doi.org/10.1177/0163443716686672
4. Rozenas А., Stukal D. How autocrats manipulate economic news: Evidence from Russia’s state-controlled television // The Journal of Politics. 2019. Vol. 81, No. 3. P. 982–996. https://dx.doi.org/10.2139/ssrn.3023254
5. Lazer D.M.J., Pentland A., Adamic L. et al. Computational social science: Obstacles and opportunities // Science. 2020. Vol. 369, No. 6507. P. 1060–1062.
https://doi.org/10.1126/science.aaz8170
6. Card D., Boydstun A.E., Gross J.H., Resnik P., Smith N.A. The media frames corpus: Annotations of frames across issues // Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 2015. P. 438–444. https://doi.org/10.3115/v1/P15-2072
7. Da San Martino G., Barrón-Cedeño A., Wachsmuth H., Nakov P. Fine-grained analysis of propaganda in news articles // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. P. 5636–5646. https://doi.org/10.18653/v1/D19-1565
8. Field A., Atanasov P., Stukal D., Tucker J.A., Guess A. Framing and agenda-setting in Russian news: A computational analysis of intricate political strategies // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018. P. 3570–3580. https://doi.org/10.18653/v1/D18-1393
9. Bhatia V., Chhaya N., Pala K., Bhargava P. OpenFraming: Open-sourced tool for computational framing analysis of multilingual data // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2021. P. 242–250. https://doi.org/10.18653/v1/2021.emnlp-demo.28
10. Card D., Paul M.J., Smith N.A. Neural models for documents with metadata // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL). 2018. P. 2031–2040. https://doi.org/10.18653/v1/P18-1189
11. Da San Martino G., Yu S., Barrón-Cedeño A. et al. SemEval-2020 Task 11: Detection of propaganda techniques in news articles // Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020). 2020. P. 1377–1414. https://doi.org/10.18653/v1/2020.semeval-1.186
12. Kwak H., An J., Jing E.M., Ahn Y. A systematic media frame analysis of 1.5 million New York Times articles from 2000 to 2017 // Proceedings of the 12th ACM Conference on Web Science. 2020. P. 305–314. https://doi.org/10.1145/3394231.3397921
13. Kwak H., An J., Jing E.M., Ahn Y. FrameAxis: Characterizing microframe bias and intensity with word embedding // PeerJ Computer Science. 2021. Vol. 7, Article e644. https://doi.org/10.7717/peerj-cs.644
14. Entman R.M. Framing bias: Media in the distribution of power // Journal of Communication. 2007. Vol. 57, No. 1. P. 163–173. https://doi.org/10.1111/j.1460-2466.2006.00336.x

This work is licensed under a Creative Commons Attribution 4.0 International License.
Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.
All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.
Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.
By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.
RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.
Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.
The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.
We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.