• Main Navigation
  • Main Content
  • Sidebar

Russian Digital Libraries Journal

  • Home
  • About
    • About the Journal
    • Aims and Scopes
    • Themes
    • Editor-in-Chief
    • Editorial Team
    • Submissions
    • Open Access Statement
    • Privacy Statement
    • Contact
  • Current
  • Archives
  • Register
  • Login
  • Search
Published since 1998
ISSN 1562-5419
16+
Language
  • Русский
  • English

Search

Advanced filters

Search Results

Image Classification Using Reinforcement Learning

Artem Aleksandrovich Elizarov , Evgenii Viktorovich Razinkov
1172-1191
Abstract:

Recently, such a direction of machine learning as reinforcement learning has been actively developing. As a consequence, attempts are being made to use reinforcement learning for solving computer vision problems, in particular for solving the problem of image classification. The tasks of computer vision are currently one of the most urgent tasks of artificial intelligence.


The article proposes a method for image classification in the form of a deep neural network using reinforcement learning. The idea of ​​the developed method comes down to solving the problem of a contextual multi-armed bandit using various strategies for achieving a compromise between exploitation and research and reinforcement learning algorithms. Strategies such as -greedy, -softmax, -decay-softmax, and the UCB1 method, and reinforcement learning algorithms such as DQN, REINFORCE, and A2C are considered. The analysis of the influence of various parameters on the efficiency of the method is carried out, and options for further development of the method are proposed.

Keywords: machine learning, image classification, reinforcement learning, contextual multi-armed bandit problem.

Analysing Machine Learning Models based on Explainable Artificial Intelligence Methods in Educational Analytics

Dmitriy Arturovich Minullin, Fail Mubarakovich Gafarov
294-315
Abstract:

The problem of predicting early dropout of students of Russian universities is urgent and therefore requires the development of new innovative approaches to solve it. To solve this problem, it is possible to develop predictive systems based on the use of student data, available in the information systems of universities. This paper investigates machine learning models for predicting early student dropout trained on the basis of student characteristics and performance data. The main scientific novelty of the work lies in the use of explainable AI methods to interpret and explain the performance of the trained machine learning models. The Explainable AI methods allow us to understand which of the input features (student characteristics) have the greatest influence on the results of the machine learning models. (student characteristics) have the greatest influence on the prediction results of trained models, and can also help to understand why the models make certain decisions. The findings expand the understanding of the influence of various factors on early dropout of students.

Keywords: educational analytics, data mining, machine learning, explainable AI.

Automatic Annotation of Training Datasets in Computer Vision using Machine Learning Methods

Aleksey Konstantinovich Zhuravlev, Karen Albertovich Grigorian
718-729
Abstract:

This paper addresses the issue of automatic annotation of training datasets in the field of computer vision using machine learning methods. Data annotation is a key stage in the development and training of deep learning models, yet the process of creating labeled data often requires significant time and labor. This paper proposes a mechanism for automatic annotation based on the use of convolutional neural networks (CNN) and active learning methods.


The proposed methodology includes the analysis and evaluation of existing approaches to automatic annotation. The effectiveness of the proposed solutions is assessed on publicly available datasets. The results demonstrate that the proposed method significantly reduces the time required for data annotation, although operator intervention is still necessary.


The literature review includes an analysis of modern annotation methods and existing automatic systems, providing a better understanding of the context and advantages of the proposed approach. The conclusion discusses achievements, limitations, and possible directions for future research in this field.

Keywords: computer vision, machine learning, automatic data annotation, training datasets, image segmentation.

Analysis and Development of the MLOps Pipeline for ML Model Deployment

Rustem Raficovich Yamikov, Karen Albertovich Grigorian
177-196
Abstract:

The growth in the number of IT products with machine-learning features is increasing the relevance of automating machine-learning processes. The use of MLOps techniques is aimed at providing training and efficient deployment of applications in a production environment by automating side infrastructure issues that are not directly related to model development.


In this paper, we review the components, principles, and approaches of MLOps and analyze existing platforms and solutions for building machine learning pipelines. In addition, we propose an approach to build a machine learning pipeline based on basic DevOps tools and open-source libraries.

Keywords: MLOps, DevOps, CI/CD, CT, ML, machine learning pipeline.

Automatic Annotation of HTML Documents using the Microdata Standard

Timur Ferdinandovich Ibragimov, Alexander Andreevich Ferenets
730-744
Abstract:

The development of an application based on machine learning methods for automatic annotation of web pages according to the Microdata standard is described, with the possibility of extension to other standards and injecting data to JSX files. Datasets were collected and prepared for training Machine Learning (ML) models. The ML model metrics were collected and analyzed.

Keywords: Microdata, semantic markup, HTML5, search engine optimization (SEO), search engines, machine learning, schema.org, semantic web, markup standards, SEO automation.

Recommendation System for Selection of Players in Team Sports Built on the Basis of Machine Learning

Rinat Rustemovich Shigapov, Alexander Andreevich Ferenets
257-280
Abstract:

This article describes the development of a recommender system for selecting players based on machine learning. The system introduced the example of hockey with the possibility of expanding its use in various team sports. For each sport different roles and characteristics of the players were considered. The article analyzes information about hockey, football, basketball and volleyball. The characteristics of the players are structured and divided into general groups. For each parameter coefficients are displayed that show the impact on the result of the match. Various machine learning algorithms were used to build the model. The web interface of the application has been created.

Keywords: sports, hockey, selection of players, recommender system, machine learning.

Machine learning methods for determining the relationship between academic success and data of social network profile

Ilyas Raisovich Ihsanov, Irina Sergeevna Shakhova
95-118
Abstract: The paper is aimed to propose the machine learning model for determining the relationship between data of social network profile and academic success of students and predicting the success using the data.
Keywords: machine learning, social networks, psychometrics, academic success, education, abiturient.

Signature Methods for Time Series Analysis

Kirill Alekseevich Mashchenko
681-700
Abstract:

Signature methods are a powerful tool for time series analysis, transforming them into a form suitable for machine learning tasks. The article examines the fundamental concepts of path signatures, their properties, and geometric interpretation, as well as computational methods for various types of time series. Examples of signature method applications in different fields, including finance, medicine, and education, are presented, highlighting their advantages over traditional approaches. Special attention is given to synthetic data generation based on signatures, which is particularly relevant when working with limited datasets. The experimental results on generating and predicting student digital learning trajectories demonstrate the effectiveness of signature methods for machine learning applications in time series analysis and forecasting.

Keywords: signature, signature methods, time series, data generation, trajectory analysis, digital footprint.

Using Machine Learning to Enhance Test Quality

Ramil Radikovich Miniukov, Mikhail Mikhailovich Abramskiy
701-717
Abstract:

This study focuses on the application of machine learning methods to improve the quality of test items. The research includes a review of the subject area and the implementation of two enhancement methods: similar question retrieval and distractor quality assessment. The first method involves testing five transformer-based models for generating text embeddings and six clustering algorithms. The second method uses the same transformer models in combination with three classification algorithms. Experimental results demonstrated the high effectiveness of the proposed approaches in solving both tasks.

Keywords: test item analysis, distractors, examination process, assessments, test quality improvement.

Experiment in building an automatic object-oriented sentiment detection system based on the syntactic and semantic analyzer

Павел Юрьевич Поляков, Мария Викторовна Калинина, Владимир Владимирович Плешко
185-202
Abstract:

This paper focuses on the use of a linguistics-based method for automatic object-oriented sentiment analyses. The study was conducted as part of SentiRuEval automatic sentiment analysis system testing cycle. The original task was to extract users’ opinions (positive, negative, neutral) about telecom companies, expressed in tweets and news. In this study news was excluded from the dataset because, being formal texts, news significantly differs from informal ones in its structure and vocabulary and therefore demands a different approach. Only linguistic approach based on syntactic and semantic analysis was used. In this approach, a sentiment-bearing word or expression is linked to its target object at either of two stages, which perform successively. The first stage includes usage of semantic templates matching the dependence tree, and the second stage involves heuristics for linking sentiment expressions and their target objects when syntactic relations between them do not exist. No machine learning was used. The method showed a very high quality, which roughly coincides with the best results of machine learning methods and hybrid approaches.

Keywords: sentiment analysis, object-oriented sentiment analysis, aspect-based sentiment analysis, opinion mining, syntactic and semantic analysis, semantic templates.

Calculated emotions model in intelelctual software systems

Максим Олегович Таланов, Александр Сергеевич Тощев
231-241
Abstract: We have studied emotions in various aspects: philosophical, psychological and neurophysiological; taking them into account cognitive architecture has been described. Based on Lovheim “Emotion Cube”, “Wheel of emotions” by Plutchik, Tomkins “Theory of affects” and Marvin Minsky thinking model we describe usage of emotions as influence factors for computing processes. Also indicated the possibility of using emotions in intelligent question-answer systems.
Keywords: artificial intelligence, virtual assistant, social agent, emotions, thinking models, calculated emotions.

Automation of android applications interactive prototypes development based on low-fidelity wireframes

Anatolii Sergeevich Hlopunov, Irina Sergeevna Shakhova
160-172
Abstract: Some mechanisms for automation of Android applications interactive prototypes development based on handwritten wireframes are described in the paper. The process of automation includes machine learning methods used for the handwritten wireframes recognition. The mobile Android application is developed to ensure user interaction with these mechanisms.
Keywords: prototyping, UI, UX, mobile applications, user interface.

Neural Network Architecture of Embodied Intelligence

Ayrat Rafkatovich Nurutdinov
598-655
Abstract:

In recent years, advances in artificial intelligence (AI) and machine learning have been driven by advances in the development of large language models (LLMs) based on deep neural networks. At the same time, despite its substantial capabilities, LLMs have fundamental limitations such as spontaneous unreliability in facts and judgments; making simple errors that are dissonant with high competence in general; credulity, manifested by a willingness to accept a user's knowingly false claims as true; and lack of knowledge about events that have occurred after training has been completed.


Probably the key reason is that bioinspired intelligence learning occurs through the assimilation of implicit knowledge by an embodied form of intelligence to solve interactive real-world physical problems. Bioinspired studies of the nervous systems of organisms suggest that the cerebellum, which coordinates movement and maintains balance, is a prime candidate for uncovering methods for realizing embodied physical intelligence. Its simple repetitive structure and ability to control complex movements offer hope for the possibility of creating an analog to adaptive neural networks.


This paper explores the bioinspired architecture of the cerebellum as a form of analog computational networks capable of modeling complex real-world physical systems. As a simple example, a realization of embodied AI in the form of a multi-component model of an octopus tentacle is presented, demonstrating the potential in creating adaptive physical systems that learn and interact with the environment.

Keywords: artificial neural network, large language model, implicit learning, cerebellum model, analog computing, embodied cognition, soft robotics, octopus.

AI in Cancer Prevention: a Retrospective Study

Petr Aleksandrovich Philonenko, Vladimir Nikolaevich Kokh, Pavel Dmitrievich Blinov
1253-1266
Abstract:

This study investigates the feasibility of effectively solving population-scale cancer screening problems using artificial intelligence (AI) methods that predict malignant neoplasm risk based on minimal electronic health record (EHR) data – medical diagnosis and service codes. To address the formulated problem, we considered a broad spectrum of modern approaches, including classical machine learning methods, survival analysis, deep learning, and large language models (LLMs). Numerical experiments demonstrated that gradient boosting using survival analysis models as additional predictors possesses the best ability to rank patients by cancer risk level, enabling consideration of both population-level and individual risk factors for malignant neoplasms. Predictors constructed from EHR data include demographic characteristics, healthcare utilization patterns, and clinical markers. This solution was tested in retrospective experiments under the supervision of specialized oncologists. In the retrospective experiment involving more than 1.9 million patients, we established that the risk group captures up to 5.4 times more patients with cancer at the same level of medical examinations. The investigated method represents a scalable solution using exclusively diagnosis and service codes, requiring no specialized infrastructure and integrable into oncological vigilance processes, making it applicable for population-scale cancer screening.

Keywords: AI in medicine, cancer prevention, retrospective experiments.

The system of emotional appraisal based on reinforcement learning and bio-inspired methods

Евгения Юрьевна Майорова, Максим Олегович Таланов, Роберт Лоу
193-215
Abstract: I research and lecture in Cognitive Science where my particular interest is in emotions – neural networks modeling and applications – and animal and human learning.
Keywords: appraisal, emotional appraisal, reinforcement learning.

Application of the Douglas-Peucker Algorithm in Online Authentication of Remote Work Tools for Specialist Training in Higher Education Group of Scientific Specialties (UGSN) 10.00.00

Anton Grigorievich Uymin, Vladimir Sergeyevich Grekov
679-694
Abstract:

In today's world, digital technologies are penetrating all aspects of human activity, including education and labor. Since 2019, when, in response to global challenges, the world's educational systems have actively started to shift to distance learning, there has been an urgent need to develop and implement reliable identification and authentication technologies. These technologies are necessary to ensure the authenticity of work and protection from falsification of academic achievements, especially in the context of higher education in accordance with the group of specialties and directions (USGS) 10.00.00 - Information Security, where laboratory and practical work play a key role in the educational process.


The problem lies in the need to optimize the flow of incoming data, which, first, can affect the retraining of the neural network core of the recognition system, and second, impose excessive requirements on the network's bandwidth. To solve this problem, efficient preprocessing of gesture data is required to simplify their trajectories while preserving the key features of the gestures.


This article proposes the use of the Douglas–Peucker algorithm for preliminary processing of mouse gesture trajectory data. This algorithm significantly reduces the number of points in the trajectories, simplifying them while preserving the main shape of the gestures. The data with simplified trajectories are then used to train neural networks.


The experimental part of the work showed that the application of the Douglas–Peucker algorithm allows for a 60% reduction in the number of points in the trajectories, leading to an increase in gesture recognition accuracy from 70% to 82%. Such data simplification contributes to speeding up the neural networks' training process and improving their operational efficiency.


The study confirmed the effectiveness of using the Douglas–Peucker algorithm for preliminary data processing in mouse gesture recognition tasks. The article suggests directions for further research, including the optimization of the algorithm's parameters for different types of gestures and exploring the possibility of combining it with other machine learning methods. The obtained results can be applied to developing more intuitive and adaptive user interfaces.

Keywords: authentication, biometric identification, remote work, distance learning, Douglas–Peucker algorithm, data preprocessing, neural network, HID devices, mouse gesture trajectories, data optimization.

Development of a Method for User Segmentation using Clustering Algorithms and Advanced Analytics

Daniil Andreevic Klinov, Karen Albertovich Grigorian
137-147
Abstract:

The article is devoted to the creation of an effective solution for user segmentation. The article presents an analysis of existing user segmentation services, an analysis of approaches to user segmentation (ABCDx segmentation, demographic segmentation, segmentation based on a user journey map), an analysis of clustering algorithms (K-means, Mini-Batch K-means, DBSCAN, Agglomerative Clustering, Spectral Clustering). The study of these areas is aimed at creating a “flexible” segmentation solution that adapts to each user sample. Dispersion analysis (ANOVA test), analysis of clustering metrics is also used to assess the quality of user segmentation. With the help of these areas, an effective solution for user segmentation has been developed using advanced analytics and machine learning technology.

Keywords: Segmentation, clustering, analysis of variance, machine learning, advanced analytics, ANOVA test, product analytics.

Variations in Microseismic Noise Spectra as a Forecast Parameter of Earthquakes in the Baikal Rift System

Lyudmila Petrovna Braginskaya, Andrey Pavlovich Grigoryuk, Valeriy Viktorovich Kovalevskiy, Anna Alexandrovna Dobrynina, Matvey Sergeevich Kim
727-739
Abstract:

This paper examines the microseismic noise spectra a few hours before moderate and strong seismic events. Forty earthquakes with an energy class of K=9.5–14.5 at epicentral distances of 10 to 120 km were considered. A statistically significant increase in the spectral power density (SPD) was detected in the 0.8–2.4 Hz range. Machine learning methods were used to construct a binary classification model that allows detection of earthquake preparations a few hours before an event based on microseismic SPD values in the specified frequency range.

Keywords: geophysical monitoring, machine learning, digital platform, precursors, seismic forecast, earthquakes.

Application of thinking model in intellectual question-answer systems

Александр Сергеевич Тощев
222-230
Abstract: We described an evolution of thinking model in application with building intellectual question-answer system for automation processing user requests in natural language, starting with simple decision trees and finished with human thinking model. Every model has been developed, prototyped and tested. Experimental data and conclusions for every model provided.
Keywords: artificial intelligence, machine learning, system analysis, machine thinking, natural language processing, decision trees.

Research of Data Processing, Detection and Protection Algorithms to Minimize the Impact of Malware and Phishing Attacks on Users of Digital Platforms

Tatiana Sergeevna Volokitina, Maxim Olegovich Tanygin
187-206
Abstract:

The article is devoted to the development of a scientific and methodological apparatus for improving the effectiveness of protecting digital platforms from cyber threats by creating processing and detection algorithms that take into account the cognitive characteristics of users. A conceptual model of a three-stage protection system is proposed, integrating technical security mechanisms with cognitive decision-making models. A heuristic detection algorithm based on Random Forest machine learning with analysis of 47 features, including technical URL characteristics and cognitive-semantic content characteristics, has been developed. A methodology for dynamic integration of four threat data sources has been created, reducing response time from 12–14 hours to two hours. An algorithm for recursive analysis of redirection chains up to ten levels deep to detect masked threats is proposed. Experimental validation on an empirical base of approximately one million records confirmed detection accuracy of 87% when processing one hundred thousand records per hour. The developed solutions ensure compliance with the requirements of GOST R 57580.1-2017 and Russian legislation in the field of personal data protection.

Keywords: heuristic threat detection, machine learning, cognitive security, phishing attacks, social engineering, data protection, threat source integration.

Creating Pseudowords Generator and Classifier of Their Similarity with Words from Russian Dictionary using Machine Learning

Kirill Alekseevich Romadanskiy, Artemii Evgenyevich Akhaev, Tagmir Radikovich Gilyazov
145-162
Abstract:

In this article, a pseudoword is defined as a unit of speech or text that appears to be a real word in Russian but actually has no meaning. A real or natural word is a unit of speech or text that has an interpretation and is presented in a dictionary. The paper presents two models for working with the Russian language: a generator that creates pseudowords that resemble real words, and a classifier that evaluates the degree of similarity between the entered sequence of characters and real words. The classifier is used to evaluate the results of the generator. Both models are based on recurrent neural networks with long short-term memory layers and are trained on a dataset of Russian nouns. As a result of the research, a file was created containing a list of pseudowords generated by the generator model. These words were then evaluated by the classifier to filter out those that were not similar enough to real words. The generated pseudowords have potential applications in tasks such as name and branding creation, layout design, art, crafting creative works, and linguistic studies for exploring language structure and words.

Keywords: word generation, pseudoword, neural network, recurrent neural network, long short-term memory.

Development of Methods and Software Tools for the Formation of a Digital Portrait of Students

Marat Albertovich Solntsev, Mikhail Mikhailovich Abramskiy
697-717
Abstract:

This paper considers the questions about the possibility of using data about the students presented in electronic form to build their digital portraits.  A set of characteristics necessary for its construction is proposed, a data model is designated.


Implemented tools for collecting data about students from social networks and other Internet resources. Algorithms for constructing a digital portrait are proposed. The application of machine learning algorithms for these tasks is illustrated. Examples of the use of digital portraits in education are given.

Keywords: social networks, data retrieval, personal portrait of user, education.

Queries to Non-Relational Data using Natural Language based on a Large Language Model

Adilbek Omirbekovich Erkimbaev, Vladimir Yurievich Zitserman, George Anatolyevich Kobzev
76-98
Abstract:

The main purpose of this work is to explore new opportunities for organizing natural language queries in scientific local databases that are not relational. A brief review of recent research shows that there has been an active introduction of natural language queries into databases of various types, and the use of machine learning methods, such as neural algorithms, is noted. The widespread use of large language models in the last two years for query generation in various language settings and fields of expertise has been demonstrated. A study has been conducted to explore the potential of the AllegroGraph graph database in using large language models for natural language search. The functionality of the database has been examined using the example of a metadata system for thermophysical properties in the form of the "Thermal" domain ontology. Testing search queries in a bilingual (English and Russian) database environment has revealed some general problems that can be overcome, and it gives us good hope for the future application of new services using large language models.

Keywords: natural language query, large language model, embedding, non-relational databases, graph database, domain ontology.

Methods and Algorithms for Increasing Linked Data Expressiveness (Overview)

Olga Avenirovna Nevzorova
808-834
Abstract: This review discusses methods and algorithms for increasing linked data expressiveness which are prepared for Web publication. The main approaches to the enrichment of ontologies are considered, the methods on which they are based and the tools for implementing the corresponding methods are described.The main stage in the general scheme of the related data life cycle in a cloud of Linked Open Data is the stage of building a set of related RDF- triples. To improve the classification of data and the analysis of their quality, various methods are used to increase the expressiveness of related data. The main ideas of these methods are concerned with the enrichment of existing ontologies (an expansion of the basic scheme of knowledge) by adding or improving terminological axioms. Enrichment methods are based on methods used in various fields, such as knowledge representation, machine learning, statistics, natural language processing, analysis of formal concepts, and game theory.
Keywords: linked data, ontology, ontology enrichment, semantic web.

Title extraction from english scientific books in PDF format

Дмитрий Сергеевич Филиппов
392-411
Abstract:

Relevance of the issue under study is due to tenuity of methods proposed by other researchers that use simple heuristics or machine learning algorithms. The purpose of the article is to provide better way to extract titles from scientific PDF documents and offer better and more reasonable approach to title selection generally. The leading approach to the study is regard as many cases and problems appeared during extraction as possible and find an approach to solve all of them. The results showed the efficiency of chosen approach in case of having a document set with all of considered problems. The research highlights that deep analysis of current task problem is a perspective to make the best solutions and tools. The article may be useful for all researchers and developers who often encounter the problem of document structural analysis or title detection as secondary task of a main program workflow.

Keywords: Pdf processing, title extraction, header extraction, strategy based approach, title heuristic, structural analysis, style information, text analysis, document analysis, information extraction.
1 - 25 of 36 items 1 2 > >> 
Information
  • For Readers
  • For Authors
  • For Librarians
Make a Submission
Current Issue
  • Atom logo
  • RSS2 logo
  • RSS1 logo

Russian Digital Libraries Journal

ISSN 1562-5419

Information

  • About the Journal
  • Aims and Scopes
  • Themes
  • Author Guidelines
  • Submissions
  • Privacy Statement
  • Contact
  • eLIBRARY.RU
  • dblp computer science bibliography

Send a manuscript

Authors need to register with the journal prior to submitting or, if already registered, can simply log in and begin the five-step process.

Make a Submission
About this Publishing System

© 2015-2026 Kazan Federal University; Institute of the Information Society