• Main Navigation
  • Main Content
  • Sidebar

Russian Digital Libraries Journal

  • Home
  • About
    • About the Journal
    • Aims and Scopes
    • Themes
    • Editor-in-Chief
    • Editorial Team
    • Submissions
    • Open Access Statement
    • Privacy Statement
    • Contact
  • Current
  • Archives
  • Register
  • Login
  • Search
Published since 1998
ISSN 1562-5419
16+
Language
  • Русский
  • English

Search

Advanced filters

Search Results

Image Classification using Convolutional Neural Networks

Sergey Alekseevich Filippov
366-382
Abstract:

Nowadays, many different tools can be used to classify images, each of which is aimed at solving a certain range of tasks. This article provides a brief overview of libraries and technologies for image classification. The architecture of a simple convolutional neural network for image classification is built. Image recognition experiments have been conducted with popular neural networks such as VGG 16 and ResNet 50. Both neural networks have shown good results. However, ResNet 50 overfitted due to the fact that the dataset contained the same type of images for training, since this neural network has more layers that allow reading the attributes of objects in the images. A comparative analysis of image recognition specially prepared for this experiment was carried out with the trained models.

Keywords: image recognition, neural network, convolutional neural network, image classification, machine learning.

Solving the Problem of Classifying the Emotional Tone of a Message with Determining the Most Appropriate Neural Network Architecture

Danis Ilmasovich Bagautdinov, Salman Salman, Vladislav Alekseevich Alekseev, Rustamdzhon Murodzhonovich Usmonov
396-413
Abstract:

To determine the most effective approach for solving the task of classifying the emotional tone of a message, we trained selected neural network models on various sets of training data. Next, based on the performance metric of the percentage of correctly classified responses on a test data set, we compared combinations of training data sets and various models trained on them. During the writing of this article, we trained four neural network models on three different sets of training data. By comparing the accuracy of the responses from each model trained on different training data sets, conclusions were drawn regarding the neural network model best suited for solving the task at hand.

Keywords: NLP, sentiment detection, neural networks, comparison of neural network models, LSTM, CNN, BiLSTM.

Steel Defects Analysis Using CNN (Convolutional Neural Networks)

Rodion Dmitrievich Gaskarov, Alexey Mikhailovich Biryukov, Alexey Fedorovich Nikonov, Daniil Vladislavovich Agniashvili, Danil Aydarovich Khayrislamov
1155-1171
Abstract:

Steel is one of the most important bulk materials these days. It is used almost everywhere - from medicine to industry. Detecting this material's defects is one of the most challenging problems for industries worldwide. This process is also manual and time-consuming. Through this study we tried to automate this process. A convolutional neural network model UNet was used for this task for more accurate segmentation with less training image data set for our model. The essence of this NN (neural network) is in step-by-step convolution of every image (encoding) and then stretching them to initial resolution, consequently getting a mask of an image with various classes on it. The foremost modification is changing an input image's size to 128x800 px resolution (original images in dataset are 256x1600 px) because of GPU memory size's limitation. Secondly, we used ResNet34 CNN (convolutional neural network) as encoder, which was pre-trained on ImageNet1000 dataset with modified output layer - it shows 4 layers instead of 34. After running tests of this model, we obtained 92.7% accuracy using images of hot-rolled steel sheets.

Keywords: CNN, neural networks, steel, machine learning, AI, Unet, ResNet, defects detection, segmentation, classification.

Applying Machine Learning to the Task of Generating Search Queries

Alexander Michailovich Gusenkov, Alina Rafisovna Sittikova
272-293
Abstract:

In this paper we research two modifications of recurrent neural networks – Long Short-Term Memory networks and networks with Gated Recurrent Unit with the addition of an attention mechanism to both networks, as well as the Transformer model in the task of generating queries to search engines. GPT-2 by OpenAI was used as the Transformer, which was trained on user queries. Latent-semantic analysis was carried out to identify semantic similarities between the corpus of user queries and queries generated by neural networks. The corpus was convert-ed into a bag of words format, the TFIDF model was applied to it, and a singular value decomposition was performed. Semantic similarity was calculated based on the cosine measure. Also, for a more complete evaluation of the applicability of the models to the task, an expert analysis was carried out to assess the coherence of words in artificially created queries.

Keywords: natural language processing, natural language generation, machine learning, neural networks.

Application of the Douglas-Peucker Algorithm in Online Authentication of Remote Work Tools for Specialist Training in Higher Education Group of Scientific Specialties (UGSN) 10.00.00

Anton Grigorievich Uymin, Vladimir Sergeyevich Grekov
679-694
Abstract:

In today's world, digital technologies are penetrating all aspects of human activity, including education and labor. Since 2019, when, in response to global challenges, the world's educational systems have actively started to shift to distance learning, there has been an urgent need to develop and implement reliable identification and authentication technologies. These technologies are necessary to ensure the authenticity of work and protection from falsification of academic achievements, especially in the context of higher education in accordance with the group of specialties and directions (USGS) 10.00.00 - Information Security, where laboratory and practical work play a key role in the educational process.


The problem lies in the need to optimize the flow of incoming data, which, first, can affect the retraining of the neural network core of the recognition system, and second, impose excessive requirements on the network's bandwidth. To solve this problem, efficient preprocessing of gesture data is required to simplify their trajectories while preserving the key features of the gestures.


This article proposes the use of the Douglas–Peucker algorithm for preliminary processing of mouse gesture trajectory data. This algorithm significantly reduces the number of points in the trajectories, simplifying them while preserving the main shape of the gestures. The data with simplified trajectories are then used to train neural networks.


The experimental part of the work showed that the application of the Douglas–Peucker algorithm allows for a 60% reduction in the number of points in the trajectories, leading to an increase in gesture recognition accuracy from 70% to 82%. Such data simplification contributes to speeding up the neural networks' training process and improving their operational efficiency.


The study confirmed the effectiveness of using the Douglas–Peucker algorithm for preliminary data processing in mouse gesture recognition tasks. The article suggests directions for further research, including the optimization of the algorithm's parameters for different types of gestures and exploring the possibility of combining it with other machine learning methods. The obtained results can be applied to developing more intuitive and adaptive user interfaces.

Keywords: authentication, biometric identification, remote work, distance learning, Douglas–Peucker algorithm, data preprocessing, neural network, HID devices, mouse gesture trajectories, data optimization.

Neural Network Architecture of Embodied Intelligence

Ayrat Rafkatovich Nurutdinov
598-655
Abstract:

In recent years, advances in artificial intelligence (AI) and machine learning have been driven by advances in the development of large language models (LLMs) based on deep neural networks. At the same time, despite its substantial capabilities, LLMs have fundamental limitations such as spontaneous unreliability in facts and judgments; making simple errors that are dissonant with high competence in general; credulity, manifested by a willingness to accept a user's knowingly false claims as true; and lack of knowledge about events that have occurred after training has been completed.


Probably the key reason is that bioinspired intelligence learning occurs through the assimilation of implicit knowledge by an embodied form of intelligence to solve interactive real-world physical problems. Bioinspired studies of the nervous systems of organisms suggest that the cerebellum, which coordinates movement and maintains balance, is a prime candidate for uncovering methods for realizing embodied physical intelligence. Its simple repetitive structure and ability to control complex movements offer hope for the possibility of creating an analog to adaptive neural networks.


This paper explores the bioinspired architecture of the cerebellum as a form of analog computational networks capable of modeling complex real-world physical systems. As a simple example, a realization of embodied AI in the form of a multi-component model of an octopus tentacle is presented, demonstrating the potential in creating adaptive physical systems that learn and interact with the environment.

Keywords: artificial neural network, large language model, implicit learning, cerebellum model, analog computing, embodied cognition, soft robotics, octopus.

Using adjacency matrices for visualization of large graphs

Zinaida Vladimirovna Apanovich
2-36
Abstract: Exponential size growth of such graphs as social networks, Internet graphs, etc. requires new approaches to their visualization. Along with node-link diagram representations, adjacency matrices and various hybrid representations are increasingly used for large graphs visualizations. This survey discusses new approaches to the visualization of large graphs using adjacency matrices and gives examples of applications where these approaches are used. We describe various types of patterns arising when adjacency matrices corresponding to modern networks are ordered, and algorithms making it possible to reveal these patterns. In particular, the use of matrix ordering methods in conjunction with algorithms looking for such graph patterns as stars, false stars, chains, near-cliques, full cliques, bipartite cores and near-bipartite cores enable users to create understandable visualizations of graphs with millions of vertices and edges. Examples of hybrid visualizations using node-link diagrams for representing sparse parts of a graph and adjacency matrices for representing dense parts are also given. The hybrid methods are used to visualize co-authorship networks, deep neural networks, to compare networks of the human brain connectivity, etc.
Keywords: large graphs, visualization, adjacency matrices, edge bundles, hybrid visualization.

Generation of Three-Dimensional Synthetic Datasets

Vlada Vladimirovna Kugurakova, Vitaly Denisovich Abramov, Daniil Ivanovich Kostiuk, Regina Airatovna Sharaeva, Rim Radikovich Gazizova, Murad Rustemovich Khafizov
622-652
Abstract:

The work is devoted to the description of the process of developing a universal toolkit for generating synthetic data for training various neural networks. The approach used has shown its success and effectiveness in solving various problems, in particular, training a neural network to recognize shopping behavior inside stores through surveillance cameras and training a neural network for recognizing spaces with augmented reality devices without using auxiliary infrared cameras. Generalizing conclusions allow planning the further development of technologies for generating three-dimensional synthetic data.

Keywords: synthetic data, synth data, dataset, artificial intelligence, AI, neural networks, NN, machine learning, ML, computer vision, three-dimensional models, 3D, metahuman, game engine, unreal engine, UE.

Real-Time Generative Simulation of Game Environment

Eduard Sergeevich Bolshakov, Vlada Vladimirovna Kugurakova
188-212
Abstract:

This paper explores the potential of generative neural network simulations, focusing on the application of reinforcement learning methods and neural world models for creating interactive worlds. Key achievements in agent training using reinforcement learning are discussed. Special attention is given to neural world models, as well as generative models such as Oasis, DIAMOND, Genie, and GameNGen, which employ diffusion networks to generate realistic and interactive game worlds. The opportunities and limitations of generative simulation models are examined, including issues related to error accumulation and memory constraints, and their impact on the quality of generation. The conclusion presents suggestions for future research directions.

Keywords: video games, game environment, generative simulation, reinforcement learning, generative neural networks, gameplay simulation, world models.

Of Neural Network Model Robustness Through Generating Invariant to Attributes Embeddings

Marat Rushanovich Gazizov, Karen Albertovich Grigorian
1142-1154
Abstract:

Model robustness to minor deviations in the distribution of input data is an important criterion in many tasks. Neural networks show high accuracy on training samples, but the quality on test samples can be dropped dramatically due to different data distributions, a situation that is exacerbated at the subgroup level within each category. In this article we show how the robustness of the model at the subgroup level can be significantly improved with the help of the domain adaptation approach to image embeddings. We have found that application of a competitive approach to embeddings limitation gives a significant increase of accuracy metrics in a complex subgroup in comparison with the previous models. The method was tested on two independent datasets, the accuracy in a complex subgroup on the Waterbirds dataset is 90.3 {y : waterbirds;a : landbackground}, on the CelebA dataset is 92.22 {y : blondhair;a : male}.

Keywords: robust classification, image classification, generative adversarial networks, domain adaptation.

Human Fatigue Evaluation by Face's Image Analysis Based upon Convolutional Neural Networks

Bairamov Azat Ilgizovich, Faskhutdinov Timur Ruslanovich, Timergalin Denis Marselevich, Yamikov Rustem Raficovich, Murtazin Vitaly Rudolfovich, Nikita Alekseevich Tumanov
582-603
Abstract:

This article presents solutions to the person's fatigue recognition problem by the face's image analysis based on convolutional neural networks. In the present paper, existing algorithms were considered. A new model's architecture was proposed and implemented. Resultant metrics of the model were evaluated.

Keywords: fatigue level, convolutional neural networks, machine learning, ResNet-152v2, facial fatigue evaluation, fatigue recognition, image processing.

The low level implementation of noradrenaline pathways via spiking neural networks

Владислав Пищулин, Максим Олегович Таланов
216-237
Abstract: The noradrenaline pathways plays important role in the emotional appraisal and feedback, as well as decision-making. We present the software system capable of automatic generation of PyNEST code based on high-level description of neuronal pathways.
Keywords: NEST, NeuCogAR, Lövheim's cube, noradrenaline.

The system of emotional appraisal based on reinforcement learning and bio-inspired methods

Евгения Юрьевна Майорова, Максим Олегович Таланов, Роберт Лоу
193-215
Abstract: I research and lecture in Cognitive Science where my particular interest is in emotions – neural networks modeling and applications – and animal and human learning.
Keywords: appraisal, emotional appraisal, reinforcement learning.

Image Classification Using Reinforcement Learning

Artem Aleksandrovich Elizarov , Evgenii Viktorovich Razinkov
1172-1191
Abstract:

Recently, such a direction of machine learning as reinforcement learning has been actively developing. As a consequence, attempts are being made to use reinforcement learning for solving computer vision problems, in particular for solving the problem of image classification. The tasks of computer vision are currently one of the most urgent tasks of artificial intelligence.


The article proposes a method for image classification in the form of a deep neural network using reinforcement learning. The idea of ​​the developed method comes down to solving the problem of a contextual multi-armed bandit using various strategies for achieving a compromise between exploitation and research and reinforcement learning algorithms. Strategies such as -greedy, -softmax, -decay-softmax, and the UCB1 method, and reinforcement learning algorithms such as DQN, REINFORCE, and A2C are considered. The analysis of the influence of various parameters on the efficiency of the method is carried out, and options for further development of the method are proposed.

Keywords: machine learning, image classification, reinforcement learning, contextual multi-armed bandit problem.

Procedural Methods for Skinning Humanoid Characters

Rim Radikovich Gazizov, Aleksey Vitalevich Shubin
404-440
Abstract:

The procedure for setting vertex weights is a very time consuming and difficult task for any 3D model artist. Therefore, the use of procedural methods to facilitate this procedure is very important.


This article analyzes various skinning techniques and identifies their advantages and disadvantages. The most frequent variants of skinning defects that arise when using standard approaches are described. The analysis of tools for skinning in the Maya 3D modeling environment has been carried out. Methods for solving some of the existing problems are proposed, but do not imply a procedural solution. Also, on the basis of neural networks, an idea of their own solution was proposed as an additional tool for the Maya program. This tool will overcome most of the disadvantages of other methods and speed up the skinning process of the model.

Keywords: 3D modeling, vertexes, rigging, neural networks.

On the Approach to Detecting Pedestrian Movement using the Method of Histograms of Oriented Gradients

Maxim Vladimirovich Bobyr, Natalya Anatol'evna Milostnaya, Natalia Igorevna Khrapova
429-447
Abstract:

An approach to automatically recognizing the movement of people at a pedestrian crossing presented in the article. This approach includes two main procedures, for each of which program code commands are given in the C# programming language using the EMGU computer vision library. In the first procedure, pedestrian detection is carried out using a combination of directional gradient histogram and support vector methods. The second procedure allows you to read frames from a video sequence and process them. This approach allows detecting the movements of people at a pedestrian crossing without using specialized neural networks. At the same time, the method proposed in the article demonstrated sufficient reliability of human movement recognition, which indicates its applicability in real conditions.

Keywords: Pedestrian Motion Recognition, EMGU, Histogram of Oriented Gradients, Support Vector Machine.

On the Synonym Search Model

Olga Muratovna Ataeva, Vladimir Alekseevich Serebriakov, Natalia Pavlovna Tuchkova
1006-1022
Abstract:

The problem of finding the most relevant documents as a result of an extended and refined query is considered. For this, a search model and a text preprocessing mechanism are proposed, as well as the joint use of a search engine and a neural network model built on the basis of an index using word2vec algorithms to generate an extended query with synonyms and refine search results based on a selection of similar documents in a digital semantic library. The paper investigates the construction of a vector representation of documents based on paragraphs in relation to the data array of the digital semantic library LibMeta. Each piece of text is labeled. Both the whole document and its separate parts can be marked. The problem of enriching user queries with synonyms was solved, then when building a search model together with word2vec algorithms, an approach of "indexing first, then training" was used to cover more information and give more accurate search results. The model was trained on the basis of the library's mathematical content. Examples of training, extended query and search quality assessment using training and synonyms are given.

Keywords: search model, word2vec algorithm, synonyms, information query, query extension.

Controlled Face Generation System using StyleGAN2 Neural Network

Marat Isangulov, Razil Minneakhmetov, Almaz Khamedzhanov, Timur Khafizyanov, Emil Pashaev, Ernest Kalimullin
466-482
Abstract:

A novel approach to supervised face generation using open-source generative models including StyleGAN2 and Ridge Regression is presented. A methodology that extends StyleGAN2 to control facial characteristics such as age, race, gender, facial expression, and hair attributes is developed, and an extensive dataset of human faces with attribute annotations is utilized. The faces were encoded in 256-dimensional latent space using the StyleGAN2 encoder, resulting in a set of characteristic latent codes. We applied the t-SNE algorithm to cluster these feature-based codes, demonstrated the ability to control face generation, and subsequently trained Ridge regression models for each dimension of the latent codes using the labeled features. When decoded using StyleGAN2, the resulting codes successfully reconstructed face images while maintaining the association with the input features. The developed approach provides an easy and efficient way to supervised face generation using existing generative models such as StyleGAN2, and opens up new possibilities for different application areas.

Keywords: machine learning, face generation, StyleGan, encoder, decoder, latent codes, feature mapping, ridge regression.

Creating Pseudowords Generator and Classifier of Their Similarity with Words from Russian Dictionary using Machine Learning

Kirill Alekseevich Romadanskiy, Artemii Evgenyevich Akhaev, Tagmir Radikovich Gilyazov
145-162
Abstract:

In this article, a pseudoword is defined as a unit of speech or text that appears to be a real word in Russian but actually has no meaning. A real or natural word is a unit of speech or text that has an interpretation and is presented in a dictionary. The paper presents two models for working with the Russian language: a generator that creates pseudowords that resemble real words, and a classifier that evaluates the degree of similarity between the entered sequence of characters and real words. The classifier is used to evaluate the results of the generator. Both models are based on recurrent neural networks with long short-term memory layers and are trained on a dataset of Russian nouns. As a result of the research, a file was created containing a list of pseudowords generated by the generator model. These words were then evaluated by the classifier to filter out those that were not similar enough to real words. The generated pseudowords have potential applications in tasks such as name and branding creation, layout design, art, crafting creative works, and linguistic studies for exploring language structure and words.

Keywords: word generation, pseudoword, neural network, recurrent neural network, long short-term memory.

Neural Network for Generating Images Based on Song Lyrics using OpenAI and CLIP Models

Alsu Rishatovna Davletgareeva, Ksenia Aleksandrovna Edkova
437-455
Abstract:

The effectiveness of the ImageNet diffusion model and CLIP models for image generation based on textual descriptions was investigated. Two experiments were conducted using various textual inputs and different parameters to determine the optimal settings for generating images from text descriptions. The results showed that while ImageNet performed well in generating images, CLIP demonstrated better alignment between textual prompts and relevant images. The obtained results highlight the high potential of combining these mentioned models for creating high-quality and contextually relevant images based on textual descriptions.

Keywords: image generation, artificial intelligence, ImageNet diffusion model, CLIP, deep learning, neural networks, natural language processing.

The low level implementation of the noradrenaline pathways of spiking neural network

Юлия Сергеевна Сафандеева, Максим Олегович Таланов
251-286
Abstract: We propose to re-implement basic emotions described by Silvian Tomkins via "Cube of emotions" of Hugo Lövheim, and spiking NNs. We use the basic mechanism of noradrenaline neuromodulation and map it to the influence over computational processes in modern computers. We implement noradrenaline pathways via neuro-biological simulator NEST.
Keywords: NEST, Lövheim, nor-adrenaline, NeuCogAR.

Automatic Annotation of Training Datasets in Computer Vision using Machine Learning Methods

Aleksey Konstantinovich Zhuravlev, Karen Albertovich Grigorian
718-729
Abstract:

This paper addresses the issue of automatic annotation of training datasets in the field of computer vision using machine learning methods. Data annotation is a key stage in the development and training of deep learning models, yet the process of creating labeled data often requires significant time and labor. This paper proposes a mechanism for automatic annotation based on the use of convolutional neural networks (CNN) and active learning methods.


The proposed methodology includes the analysis and evaluation of existing approaches to automatic annotation. The effectiveness of the proposed solutions is assessed on publicly available datasets. The results demonstrate that the proposed method significantly reduces the time required for data annotation, although operator intervention is still necessary.


The literature review includes an analysis of modern annotation methods and existing automatic systems, providing a better understanding of the context and advantages of the proposed approach. The conclusion discusses achievements, limitations, and possible directions for future research in this field.

Keywords: computer vision, machine learning, automatic data annotation, training datasets, image segmentation.

Data Extraction from Similarly Structured Scanned Documents

Rustem Damirovich Saitgareev, Bulat Rifatovich Giniyatullin, Vladislav Yurievich Toporov, Artur Aleksandrovich Atnagulov, Farid Radikovich Aglyamov
667-688
Abstract:

Currently, the major part of transmitted and stored data is unstructured, and the amount of unstructured data is growing rapidly each year, although it is hardly searchable, unqueryable, and its processing is not automated. At the same time, there is a growth of electronic document management systems. This paper proposes a solution for extracting data from paper documents considering their structure and layout based on document photos. By examining different approaches, including neural networks and plain algorithmic methods, we present their results and discuss them.

Keywords: neural networks, document structure.
1 - 23 of 23 items
Information
  • For Readers
  • For Authors
  • For Librarians
Make a Submission
Current Issue
  • Atom logo
  • RSS2 logo
  • RSS1 logo

Russian Digital Libraries Journal

ISSN 1562-5419

Information

  • About the Journal
  • Aims and Scopes
  • Themes
  • Author Guidelines
  • Submissions
  • Privacy Statement
  • Contact
  • eLIBRARY.RU
  • dblp computer science bibliography

Send a manuscript

Authors need to register with the journal prior to submitting or, if already registered, can simply log in and begin the five-step process.

Make a Submission
About this Publishing System

© 2015-2025 Kazan Federal University; Institute of the Information Society