Automatic Addition of Seo Metadata to News Articles using Qwen-Coder

Main Article Content

Hamza Salem
Alexander Sergeevich Toschev

Abstract

A previously developed pipeline for enriching news articles with structured data is summarized, and an updated configuration is presented in which GPT-3–OpenAI’s third-generation natural language processing model – is replaced with Qwen-Coder. As before, the updated enrichment pipeline uses a dataset of 400 pages selected from Google News, a free news aggregator by Google, remains compatible with the Google Rich Results Test (Google’s tool for validating eligible structured results), and demonstrates that GPT-3-comparable output quality can be achieved on a low-power desktop PC. We describe how this substitution reduces dependence on paid GPT services and report an evaluation comparing the similarity of outputs produced by Qwen-Coder against the GPT-based baseline. The results also show higher performance of the new algorithm compared with the GPT version. The proposed tools lower the barrier to adopting semantic markup practices and thereby broaden their application in digital journalism. Overall, the findings support Qwen-Coder as a cost-effective alternative to large proprietary models for metadata enrichment tasks.

Article Details

How to Cite
Salem, H., and A. S. Toschev. “Automatic Addition of Seo Metadata to News Articles Using Qwen-Coder”. Russian Digital Libraries Journal, vol. 29, no. 1, Feb. 2026, pp. 287-03, doi:10.26907/1562-5419-2026-29-1-287-303.

References

Hui B., Yang J., Cui Z. et al. Qwen2.5-Coder Technical Report // arXiv. 2024. arXiv:2409.12186. URL: https://arxiv.org/abs/2409.12186 (access date: 10.01.2026).
2. Wang Q. Normalization and Differentiation in Google News: A Multi-Method Analysis of the World’s Largest News Aggregator: Thesis. Rutgers University, NJ, USA, 2020.
3. Rich Results Test. URL: https://search.google.com/test/rich-results (access date: 08.10.2024).
4. Bashir F., Warraich N.F. Systematic literature review of Semantic Web for distance learning // Interactive Learning Environments. 2020. Vol. 31. P. 527–543.
5. Breit A., Waltersdorfer L., Ekaputra F.J., Sabou M., Ekelhart A., Iana A., Paulheim H., Portisch J., Revenko A., Teije A.T., et al. Combining Machine Learning and Semantic Web: A Systematic Mapping Study // ACM Computing Surveys. 2023. Vol. 55. Art. 313.
6. Yu L. Introduction to the Semantic Web and Semantic Web Services. Boca Raton, FL, USA: Chapman and Hall/CRC, 2007.
7. Sporny M., Longley D., Kellogg G., Lanthaler M., Lindström N. JSON-LD 1.1: W3C Recommendation. 2020.
8. Salem H., Salloum H., Orabi O., Sabbagh K., Mazzara M. Enhancing News Articles: Automatic SEO Linked Data Injection for Semantic Web Integration // Applied Sciences. 2025. Vol. 15. Art. 1262. https://doi.org/10.3390/app15031262.
9. OpenAI. GPT-3 powers the next generation of apps. 2021. URL: https://openai.com/index/gpt-3-apps/ (access date: 16.01.2026)
10. Shadbolt N., Berners-Lee T., Hall W. The Semantic Web Revisited // IEEE Intelligent Systems. 2006. Vol. 21. P. 96–101.
11. Poturak M., Keco D., Tutnic E. Influence of search engine optimization (SEO) on business performance: Case study of private university in Sarajevo // International Journal of Research in Business and Social Science. 2022. Vol. 11. P. 59–68.
12. Chandrasekaran B., Josephson J.R., Benjamins V.R. What are ontologies, and why do we need them? // IEEE Intelligent Systems and Applications. 1999. Vol. 14. P. 20–26.
13. Sporny M., Longley D., Kellogg G., Lanthaler M., Lindström N. JSON-LD 1.0: W3C Recommendation. 2014.
14. Adida B., Birbeck M., McCarron S., Pemberton S. RDFa in XHTML: Syntax and processing: W3C Recommendation. 2008.
15. Iqbal M., Khalid M.N., Manzoor A.A., Malik M., Shaikh N.A. Search Engine Optimization (SEO): A Study of important key factors in achieving a better Search Engine Result Page (SERP) Position // Sukkur IBA Journal of Computing and Mathematical Sciences. 2022. Vol. 6. P. 1–15.
16. Alfiana F., Khofifah N., Ramadhan T., Septiani N., Wahyuningsih W., Azizah N.N., Ramadhona N. Apply the Search Engine Optimization (SEO) Method to determine Website Ranking on Search Engines // International Journal of Cyber Services and Management. 2023. Vol. 3. P. 65–73.
17. Mbonigaba C., Sujatha S., Kumar A.D., Vasuki M. Leveraging Digital Channels for Customer Engagement and Sales: Evaluating SEO, Content Marketing, and Social Media for Brand Growth // International Journal of Engineering Research and Modern Education. 2024. Vol. 9. P. 32–40.
18. Lew O.D., Kammerer Y. Factors influencing viewing behavior on search engine results pages: A review of eye-tracking research // Behavior & Information Technology. 2020. Vol. 40. P. 1485–1515.
19. Rahman A.F.R., Alam H., Hartono R. Content Extraction from HTML Documents // Proceedings of the 1st International Workshop on Web Document Analysis (WDA2001). Seattle, WA, USA, 8 September 2001.
20. Lima R., Espinasse B., Oliveira H., Pentagrossa L., Freitas F. Information Extraction from the Web: An Ontology-Based Method Using Inductive Logic Programming // Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. Herndon, VA, USA, 4–6 November 2013. P. 951–958.
21. Zheng S., Song R., Wen J.-R. Template-Independent News Extraction Based on Visual Consistency // Proceedings of the 22nd National Conference on Artificial Intelligence. Vancouver, BC, Canada, 22–26 July 2007. Washington, DC, USA: AAAI Press, 2007. P. 1507–1512.
22. Zhu W., Dai S., Song Y., Lu Z. Extracting news content with visual unit of web pages // Proceedings of the 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). Takamatsu, Japan, 1–3 June 2015. P. 1–5.
23. Gupta S., Kaiser G., Neistadt D., Grimm P. DOM-based content extraction of HTML documents // Proceedings of the 12th International Conference on World Wide Web. Budapest, Hungary, 20–24 May 2003. P. 207–214.
24. Mirzaaghaei M., Mesbah A. DOM-based test adequacy criteria for web applications // Proceedings of the 2014 International Symposium on Software Testing and Analysis. San Jose, CA, USA, 21–26 July 2014. P. 71–81.
25. Lin J. Divergence Measures Based on the Shannon Entropy // IEEE Transactions on Information Theory. 1991. Vol. 37, No. 1. P. 145–151. https://doi.org/10.1109/18.61115.
26. Corander J., Remes U., Koski T. On the Jensen-Shannon divergence and the variation distance for categorical probability distributions // Kybernetika. 2021. Vol. 57. P. 879–907.
27. Nielsen F. Jensen–Shannon divergence and diversity index: Origins and some extensions. Preprint. 2021.
28. Menéndez M.L., Pardo J.A., Pardo L., Pardo M.C. The Jensen–Shannon divergence // Journal of the Franklin Institute. 1997. Vol. 334. P. 307–318.
29. Qwen Team. Qwen3-Coder: GitHub repository. URL: https://github.com/QwenLM/Qwen3-Coder (access date 11.11.2025).