• Main Navigation
  • Main Content
  • Sidebar

Russian Digital Libraries Journal

  • Home
  • About
    • About the Journal
    • Aims and Scopes
    • Themes
    • Editor-in-Chief
    • Editorial Team
    • Submissions
    • Open Access Statement
    • Privacy Statement
    • Contact
  • Current
  • Archives
  • Register
  • Login
  • Search
Published since 1998
ISSN 1562-5419
16+
Language
  • Русский
  • English

Search

Advanced filters

Search Results

An Algorithmic Framework for Accurately Extracting Main Content from News Websites

Hamza Salem, Alexander Sergeevich Toschev
931-942
Abstract:

A new precise MCE algorithm for extracting the main content from news websites is presented. The proposed algorithm uses analysis of the Document Object Model (DOM) structure and content density metrics to identify and extract the informational core of a web page. The implemented approach combines three key features: the maximum number of direct child elements containing text, the maximum textual content without child elements containing text, and the closest position to the average node depth. The algorithm demonstrated superior performance compared to existing solutions such as Boilerpipe and Readability, achieving 99.96% precision, 99.69% recall, and 99.80% F1-score on a comprehensive dataset of 500 diverse web pages. Its language-independent design makes the algorithm particularly effective for extracting multilingual content, including languages with complex structures such as Arabic.

Keywords: NLP, Data Extraction, Language-Independent Algorithm, RAG (Retrieval-Augmented Generation).

Available Internet: From the WAI Initiative to Russian Practice

Tatyana Alekseevna Polilova
119-144
Abstract:

For many years, the W3C (World Wide Web Consortium) has been promoting the WAI (Web Accessibility Initiative) project, the main slogan of which is formulated as "Making the Web accessible". As part of the WAI initiative, WCAG (Web Content Accessibility Guidelines) are being developed to help website developers take into account the needs of people with disabilities. GOST R 52872-2019 has been developed in the Russian Federation, based on WCAG recommendations. Some provisions of GOST R 52872-2019 are presented in this paper.


Law № 181-FZ on the social protection of persons with disabilities, which has been in force since 1995, establishes a norm according to which developers of information resources must create conditions for people with disabilities to freely use communications and information. The general provisions of Law № 181-FZ are implemented in the directive documents of relevant departments. The paper considers the provisions of the order of the Ministry of Finance of 2023, which determine the procedure for presenting information on the websites of organizations in a form convenient for people with vision and hearing problems. The provisions of the above-mentioned order of the Ministry of Finance encourage developers of websites of organizations subordinate to government bodies in the Russian Federation at various levels to ensure sufficient text contrast, adhere to adaptive design, equip non-text objects with a text layer or comments, simplifying the work of people with disabilities on the Internet and contributing to the development of artificial intelligence tools.

Keywords: WAI initiative, WCAG recommendations, GOST R 52872-2019, digital content, accessibility for people with disabilities.

Web resources of russian universities: the self-organization or administrative impact-case study of St. Petersburg State University

Андрей Анатольевич Печников
283-301
Abstract: The web space of the organization is the many web sites connected by hyperlinks. The main question considered in the article, is whether this web space self-organizing, i.e. the ordering of its elements occurs at the expense of their internal interaction without external influences, or external (so-called "administrative") the impact is so strong that their influence can be detected. The article proposes a common approach demonstrated by the example of the web space of St. Petersburg State University. On the question of whether significant administrative influence on the web space of the University, in this case on the basis of the analysis given a positive response
Keywords: hyperlink, web site, web graph, dynamic model, administrative impact.

Web application development based on technologies, resources and services of the Geoportal of the Institute of Computational Modelling SB RAS

О.Э. Якубайлик, А.А. Кадочников, А.В. Токарев
Abstract: The geoportal is a mapping web site; it can be described as specialized software and technologies for spatial data processing. Geoportal's main task is to provide the user with the tools and services of storing and cataloguing, publications and download the spatial (geographic) data, search and filter by metadata, interactive web visualization, direct access to geodata based web mapping services. Geoportal developed in ICM SB RAS with appropriate set of its components and services, has become a GIS platform for creating a number of applied GIS web applications. The article deals with the experience of design and development of these systems.
Keywords: spatial data processing, geodata, web mapping services, geoportal, GIS web applications.
1 - 4 of 4 items
Information
  • For Readers
  • For Authors
  • For Librarians
Make a Submission
Current Issue
  • Atom logo
  • RSS2 logo
  • RSS1 logo

Russian Digital Libraries Journal

ISSN 1562-5419

Information

  • About the Journal
  • Aims and Scopes
  • Themes
  • Author Guidelines
  • Submissions
  • Privacy Statement
  • Contact
  • eLIBRARY.RU
  • dblp computer science bibliography

Send a manuscript

Authors need to register with the journal prior to submitting or, if already registered, can simply log in and begin the five-step process.

Make a Submission
About this Publishing System

© 2015-2025 Kazan Federal University; Institute of the Information Society