Abstractive Summarization for Trade News Analysis Based on a New Domain-Specific Dataset
Main Article Content
Abstract
We present TradeNewsSum—a corpus for abstractive summarization of international trade news—covering Russian- and English-language publications from domain-specific sources. All summaries are manually prepared following unified guidelines. We conducted experiments with fine-tuning transformer and seq2seq models and performed automatic evaluation using the LLM-as-a-judge scheme. LLaMA 3.1 in instruction-prompting mode achieved the best results, showing high scores across metrics, including factual completeness.
Article Details
References
2. Banerjee S., Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments // Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 2005. P. 65–72.
3. Fabbri A. R. et al. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model // arXiv preprint arXiv:1906.01749. 2019.
4. Fischer T., Remus S., Biemann C. Measuring faithfulness of abstractive summaries // Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022). 2022. P. 63–73.
5. Fu J. et al. Gptscore: Evaluate as you desire // arXiv preprint arXiv:2302.04166. 2023.
6. Gavrilov D., Kalaidin P., Malykh V. Self-attentive model for headline generation // Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part II 41. Springer International Publishing, 2019. P. 87–93.
7. Goyal T., Li J. J., Durrett G. News summarization and evaluation in the era of gpt-3 // arXiv preprint arXiv:2209.12356. 2022.
8. Grusky M., Naaman M., Artzi Y. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies // arXiv preprint arXiv:1804.11283. 2018.
9. Gusev I. Dataset for automatic summarization of Russian news // Artificial Intelligence and Natural Language: 9th Conference, AINL 2020, Helsinki, Finland, October 7–9, 2020, Proceedings 9. Springer International Publishing, 2020. P. 122–134.
10. Hasan T. et al. XL-sum: Large-scale multilingual abstractive summarization for 44 languages // arXiv preprint arXiv:2106.13822. 2021.
11. Kryściński W. et al. Neural text summarization: A critical evaluation // arXiv preprint arXiv:1908.08960. 2019.
12. Lewis M. et al. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension // arXiv preprint arXiv:1910.13461. 2019.
13. Liu Y. et al. G-eval: NLG evaluation using gpt-4 with better human alignment // arXiv preprint arXiv:2303.16634. 2023.
14. Narayan S., Cohen S. B., Lapata M. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization // arXiv preprint arXiv:1808.08745. 2018.
15. Paulus R., Xiong C., Socher R. A deep reinforced model for abstractive summarization // arXiv preprint arXiv:1705.04304. 2017.
16. Raffel C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer // Journal of machine learning research. 2020. Vol. 21, No. 140. P. 1–67.
17. Rush A.M., Chopra S., Weston J. A neural attention model for abstractive sentence summarization // arXiv preprint arXiv:1509.00685. 2015.
18. Sandhaus E. The New York Times Annotated Corpus Overview [Electronic resource]. Philadelphia: Linguistic Data Consortium, 2008. (LDC Catalog No. LDC2008T19). https://gwern.net/doc/ai/dataset/2008-sandhaus.pdf (accessed: 21.05.2025).
19. Scialom T. et al. MLSUM: The multilingual summarization corpus // arXiv preprint arXiv:2004.14900. 2020.
20. See A., Liu P. J., Manning C.D. A Neural Attention Model for Abstractive Sentence Summarization [Electronic resource]. 2016.
https://github.com/abisee/cnn-dailymail (accessed 07.04.2025).
21. See A., Liu P.J., Manning C.D. Get to the point: Summarization with pointer-generator networks // arXiv preprint arXiv:1704.04368. 2017.
22. Varab D., Schluter N. MassiveSumm: a very large-scale, very multilingual, news summarisation dataset // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. P. 10150–10161.
23. Vaswani A. et al. Attention is all you need // Advances in neural information processing systems. 2017. Vol. 30.
24. Xin L., Liutova D., Malykh V. Cross-Language Summarization in Russian and Chinese Using the Reinforcement Learning // International Conference on Analysis of Images, Social Networks and Texts. Cham: Springer Nature Switzerland, 2024. P. 179–192.
25. Yutkin M. Lenta.Ru News Dataset [Electronic resource]. 2018. Available at: https://github.com/yutkin/Lenta.Ru-News-Dataset (accessed 04.05.2025).
26. Zhang J. et al. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization // International conference on machine learning. PMLR, 2020. P. 11328–11339.
27. Zhang T. et al. Bertscore: Evaluating text generation with bert // arXiv preprint arXiv:1904.09675. 2019.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.
All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.
Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.
By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.
RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.
Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.
The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.
We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.