Vol. 25 No. 2 (2022) | Russian Digital Libraries Journal

A Mobile System for Collecting a Digital Trace for the Task of Accounting and Analyzing Horizontal Learning in the Learning Process without using a Cellular Connection

Robert Rinatovich Alimbekov, Airat Faridovich Khasyanov

104-120

Abstract:

Today, users of mobile applications in different areas leave a huge amount of digital footprint. The main types of digital footprints are text, photos, videos, audio, and current location. To assist the teacher in horizontal learning, a mobile application that collects all of the above types of digital footprint was developed as well as web application that analyzes it.

Keywords: cellular communication, mobile application, digital footprint, digital footprint collection, accounting, analysis.

PDF (Русский)

Development of the Expert System for Building the Architecture of Software Products

Andrey Evgenyevic Grishin, Karen Albertovich Grigorian

121-136

Abstract:

The article is devoted to automation of the software design stage. In the course of the study, the reasons for the high importance of this stage and the relevance of its automation were analyzed. The main stages of this stage were also considered and the existing systems that allow automating each of them were considered. In addition, an own solution was proposed within the framework of the problem of class structure refactoring based on the combinatorial optimization method. A solution method has been developed to improve the quality of the class hierarchy and tested on a real model.

Keywords: automation, design, refactoring, software architecture, OOP, optimization.

PDF (Русский)

Development of a Method for User Segmentation using Clustering Algorithms and Advanced Analytics

Daniil Andreevic Klinov, Karen Albertovich Grigorian

137-147

Abstract:

The article is devoted to the creation of an effective solution for user segmentation. The article presents an analysis of existing user segmentation services, an analysis of approaches to user segmentation (ABCDx segmentation, demographic segmentation, segmentation based on a user journey map), an analysis of clustering algorithms (K-means, Mini-Batch K-means, DBSCAN, Agglomerative Clustering, Spectral Clustering). The study of these areas is aimed at creating a “flexible” segmentation solution that adapts to each user sample. Dispersion analysis (ANOVA test), analysis of clustering metrics is also used to assess the quality of user segmentation. With the help of these areas, an effective solution for user segmentation has been developed using advanced analytics and machine learning technology.

Keywords: Segmentation, clustering, analysis of variance, machine learning, advanced analytics, ANOVA test, product analytics.

PDF (Русский)

Building a Digital Geological Knowledge Management System to Support Scientific Research

Michail Ivanovich Patuk, Vera Viktorovna Naumova

148-158

Abstract:

The paper describes new approaches to collecting data on scientific publications from open access systems with the subject of Earth Science. Based on the developed and adapted approaches, an archive of scientific publications (repository) and a set of programs for accessing scientific publications for collecting, searching, filtering, cataloging and managing publications and their metadata have been created. In order to improve the availability of publications and other related data on the websites of the SGM RAS, the Wiki – Geology of Russia system has been developed. This system is a thematic rubric in the direction of "Mineral deposits of Russia", with an additional topic "Mineralogy". All articles must have a link to the source of information from the archive of scientific publications and, optionally, additional links on similar topics. Wiki – Geology of Russia is the first step in creating a knowledge base on mineral deposits.

Keywords: Wiki – Geology of Russia, knowledge management systems, repository.

PDF (Русский)

Development a Data Validation Module to Satisfy the Retention Policy Metric

Aigul Ildarovna Sibgatullina, Azat Shavkatovich Yakupov

159-178

Abstract:

Every year the size of the global big data market is growing. Analysing these data is essential for good decision-making. Big data technologies lead to a significant cost reduction with use of cloud services, distributed file systems, when there is a need to store large amounts of information. The quality of data analytics is dependent on the quality of the data themselves. This is especially important if the data has a retention policy and migrates from one source to another, increasing the risk of a data loss. Prevention of negative consequences from data migration is achieved through the process of data reconciliation – a comprehensive verification of large amounts of information in order to confirm their consistency.

This article discusses probabilistic data structures that can be used to solve the problem, and suggests an implementation – data integrity verification module using a Counting Bloom filter. This module is integrated into Apache Airflow to automate its invocation.

Keywords: big data, retention policy, partition, parquet file, Bloom filter.

PDF (Русский)

Analysis and Development of the MLOps Pipeline for ML Model Deployment

Rustem Raficovich Yamikov, Karen Albertovich Grigorian

177-196

Abstract:

The growth in the number of IT products with machine-learning features is increasing the relevance of automating machine-learning processes. The use of MLOps techniques is aimed at providing training and efficient deployment of applications in a production environment by automating side infrastructure issues that are not directly related to model development.

In this paper, we review the components, principles, and approaches of MLOps and analyze existing platforms and solutions for building machine learning pipelines. In addition, we propose an approach to build a machine learning pipeline based on basic DevOps tools and open-source libraries.

Keywords: MLOps, DevOps, CI/CD, CT, ML, machine learning pipeline.

PDF (Русский)

Full Issue

Articles

A Mobile System for Collecting a Digital Trace for the Task of Accounting and Analyzing Horizontal Learning in the Learning Process without using a Cellular Connection

Development of the Expert System for Building the Architecture of Software Products

Development of a Method for User Segmentation using Clustering Algorithms and Advanced Analytics

Building a Digital Geological Knowledge Management System to Support Scientific Research

Development a Data Validation Module to Satisfy the Retention Policy Metric

Analysis and Development of the MLOps Pipeline for ML Model Deployment