Construction and Annotation of a Russian-Language News Corpus for Automated Detection of Political Manipulation

Main Article Content

Nina Leonidovna Kulyulina

Abstract

This paper addresses the challenge of developing specialized corpus resources for the automated analysis of political manipulation in Russian-language media discourse. Although semantic text analysis and computational discourse analysis have advanced substantially in recent years, most existing corpora and annotation schemes are designed for English-language data and do not adequately capture the linguistic and discursive characteristics of Russian-language news media. The objective of this study is to construct a specialized corpus of Russian-language news texts and to develop an annotation scheme tailored to the automated analysis of political manipulation, with explicit consideration of the linguistic and discursive features of the Russian-language media environment. The study introduces a corpus of sentence-level fragments extracted from Russian-language news texts published between 2010 and 2019, together with an annotation scheme for manipulative techniques. The scheme is based on an adaptation of established international classifications of manipulative strategies and is reduced to a limited set of interpretable techniques relevant to Russian-language news discourse. The proposed framework covers emotional, argumentative, and contextual forms of manipulative influence. The resulting corpus and annotation scheme provide an empirical foundation for the development and evaluation of automated methods for analyzing political manipulation in Russian-language news media and may also support further research in media and political discourse.

Article Details

How to Cite
Kulyulina, N. L. “Construction and Annotation of a Russian-Language News Corpus for Automated Detection of Political Manipulation”. Russian Digital Libraries Journal, vol. 29, no. 3, June 2026, pp. 782-97, doi:10.26907/1562-5419-2026-29-3-782-797.

References

1. Entman R.M. Framing: Toward clarification of a fractured paradigm // Journal of Communication. 1993. Vol. 43, No. 4. P. 51–58. https://doi.org/10.1111/j.1460-2466.1993.tb01304.x
2. Chong D., Druckman J.N. Framing theory // Annual Review of Political Science. 2007. Vol. 10. P. 103–126. https://doi.org/10.1146/annurev.polisci.10.072805.103054
3. Mejias U.A., Vokuev N.E. Disinformation and the media: The case of Russia and Ukraine // Media, Culture & Society. 2017. Vol. 39, No. 7. P. 1027–1042. https://doi.org/10.1177/0163443716686672
4. Rozenas А., Stukal D. How autocrats manipulate economic news: Evidence from Russia’s state-controlled television // The Journal of Politics. 2019. Vol. 81, No. 3. P. 982–996. https://dx.doi.org/10.2139/ssrn.3023254
5. Lazer D.M.J., Pentland A., Adamic L. et al. Computational social science: Obstacles and opportunities // Science. 2020. Vol. 369, No. 6507. P. 1060–1062.
https://doi.org/10.1126/science.aaz8170
6. Card D., Boydstun A.E., Gross J.H., Resnik P., Smith N.A. The media frames corpus: Annotations of frames across issues // Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 2015. P. 438–444. https://doi.org/10.3115/v1/P15-2072
7. Da San Martino G., Barrón-Cedeño A., Wachsmuth H., Nakov P. Fine-grained analysis of propaganda in news articles // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. P. 5636–5646. https://doi.org/10.18653/v1/D19-1565
8. Field A., Atanasov P., Stukal D., Tucker J.A., Guess A. Framing and agenda-setting in Russian news: A computational analysis of intricate political strategies // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018. P. 3570–3580. https://doi.org/10.18653/v1/D18-1393
9. Bhatia V., Chhaya N., Pala K., Bhargava P. OpenFraming: Open-sourced tool for computational framing analysis of multilingual data // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2021. P. 242–250. https://doi.org/10.18653/v1/2021.emnlp-demo.28
10. Card D., Paul M.J., Smith N.A. Neural models for documents with metadata // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL). 2018. P. 2031–2040. https://doi.org/10.18653/v1/P18-1189
11. Da San Martino G., Yu S., Barrón-Cedeño A. et al. SemEval-2020 Task 11: Detection of propaganda techniques in news articles // Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020). 2020. P. 1377–1414. https://doi.org/10.18653/v1/2020.semeval-1.186
12. Kwak H., An J., Jing E.M., Ahn Y. A systematic media frame analysis of 1.5 million New York Times articles from 2000 to 2017 // Proceedings of the 12th ACM Conference on Web Science. 2020. P. 305–314. https://doi.org/10.1145/3394231.3397921
13. Kwak H., An J., Jing E.M., Ahn Y. FrameAxis: Characterizing microframe bias and intensity with word embedding // PeerJ Computer Science. 2021. Vol. 7, Article e644. https://doi.org/10.7717/peerj-cs.644
14. Entman R.M. Framing bias: Media in the distribution of power // Journal of Communication. 2007. Vol. 57, No. 1. P. 163–173. https://doi.org/10.1111/j.1460-2466.2006.00336.x