• Main Navigation
  • Main Content
  • Sidebar

Russian Digital Libraries Journal

  • Home
  • About
    • About the Journal
    • Aims and Scopes
    • Themes
    • Editor-in-Chief
    • Editorial Team
    • Submissions
    • Open Access Statement
    • Privacy Statement
    • Contact
  • Current
  • Archives
  • Register
  • Login
  • Search
Published since 1998
ISSN 1562-5419
16+
Language
  • Русский
  • English

Search

Advanced filters

Search Results

Procedure for Comparing Text Recognition Software Solutions For Scientific Publications by the Quality of Metadata Extraction

Ilia Igorevich Kuznetsov , Oleg Panteleevich Novikov, Dmitry Yurievich ILIN
654-680
Abstract:

Metadata of scientific publications are used to build catalogs, determine the citation of publications, and perform other tasks. Automation of metadata extraction from PDF files provides means to speed up the execution of the designated tasks, while the possibility of further use of the obtained data depends on the quality of extraction. Existing software solutions were analyzed, after which three of them were selected: GROBID, CERMINE, ScientificPdfParser. A procedure for comparing software solutions for recognizing texts of scientific publications by the quality of metadata extraction is proposed. Based on the procedure, an experiment was conducted to extract 4 types of metadata (title, abstract, publication date, author names). To compare software solutions, a dataset of 112,457 publications divided into 23 subject areas formed on the basis of Semantic Scholar data was used. An example of choosing an effective software solution for metadata extraction under the conditions of specified priorities for subject areas and types of metadata using a weighted sum is given. It was determined that for the given example CERMINE shows efficiency 10.5% higher than GROBID and 9.6% higher than ScientificPdfParser.

Keywords: text recognition, scientific publications, metadata, data extraction quality, procedure.

Data Extraction from Similarly Structured Scanned Documents

Rustem Damirovich Saitgareev, Bulat Rifatovich Giniyatullin, Vladislav Yurievich Toporov, Artur Aleksandrovich Atnagulov, Farid Radikovich Aglyamov
667-688
Abstract:

Currently, the major part of transmitted and stored data is unstructured, and the amount of unstructured data is growing rapidly each year, although it is hardly searchable, unqueryable, and its processing is not automated. At the same time, there is a growth of electronic document management systems. This paper proposes a solution for extracting data from paper documents considering their structure and layout based on document photos. By examining different approaches, including neural networks and plain algorithmic methods, we present their results and discuss them.

Keywords: neural networks, document structure.

The Using of DVM-System for Developing of a Program for Calculations of the Problem of Radiation Magnetic Gas Dynamics and Research of Plasma Dynamics in the QSPA Channel

Vladimir Aleksandrovich Bakhtin, Dmitry Aleksandrovich Zakharov, Andrey Nikolaevich Kozlov, Veniamin Sergeevich Konovalov
594-614
Abstract: DVM-system is designed for the development of parallel programs of scientific and technical calculations in the C-DVMH and Fortran-DVMH languages. These languages use a single DVMH-model of parallel programming model and are an extension of the standard C and Fortran languages with parallelism specifications in the form of compiler directives. The DVMH model makes it possible to create efficient parallel programs for heterogeneous computing clusters, in the nodes of which accelerators, graphic processors or Intel Xeon Phi coprocessors can be used as computing devices along with universal multi-core processors. The article describes the experience of the successful using of DVM-system to develop a parallel software code for calculating the problem of radiation magnetic gas dynamics and for research of plasma dynamics in the QSPA channel.
Keywords: automation of development of parallel programs, DVM-system, plasma accelerator, radiation magnetic gas dynamics.

Software and technologies for geoportal of ICM SB RAS

О.Э. Якубайлик, А.А. Кадочников, А.Г. Матвеев, А.С. Пятаев, А.В. Токарев
Abstract: Research on design and development of software and technological support for geo-portal of ICM SB RAS are discussed. Its main components and implementation details are presented. A number of problems are discussed in details, such as web-based metadata catalog, the logic of building applications based on web services geoportal. A list of implemented information systems, based on discussed technologies is presented. The authors were directly involved in the develop-ment and implementation of geoportal based projects. In developing the software many different software libraries and components were used. Web mapping user interface was created using a number of open source libraries. To create a server-side web application authors used GIS platforms MapGuide Open Source and Min-nesota Mapserver. GeoWebCache was another essential component of distributed web mapping ap-plications. By analyzing and summarizing the experience gained creating information systems, it should be noted that the use of geoportal-based solutions in this area, can dramatically improve the efficiency of software development and problems solving.
Keywords: geoportal, geographic data, metadata catalog, user interface, map applications, web services.

Computed knowledge base for description of information resources of molecular spectroscopy. 4. Software

А.Ю. Ахлёстин, А.В. Козодоев, Н.А. Лаврентьев, А.И. Привезенцев, А.З Фазлиев
Abstract: Middleware and applied software for the development of information-computational three-layer architecture system on molecular spectroscopy is described. The article is mainly focused on applied software of information layer and knowledge layer. It also describes basic classes and packages of classes used for the realization of software solutions of dozens of information tasks related to import, comparison, representation and decomposition of data sources as well as representing publications' models in quantitative spectroscopy.
Keywords: molecular spectroscopy, description of information resources, applied software, three-layer architecture system on molecular spectroscopy, basic classes, packages of classes, publications' models.

About Measuring of the Contribution of Software Decisions to Program Performance

Lidia Vasiljevna Gorodnyaya, Tatiana Anatolevna Andreyeva
738-759
Abstract:

The article draws attention to the problem of measuring the effect that programming solutions have on the programming productivity and performance, in educational programming and the correctness-saving program improvements. The results of some experiments concerning these questions are discussed. The hypothesis that  functional models can provide a metric scale capable of separating features of programming languages and systems from features of programs and programming solutions is proposed. The results of a preliminary demonstrative experiment in studying the dependence of the program productivity upon the opted compiler and, on the other hand, upon the representation of the programming solution in the opted programming languages are described. Analysis of these results leads to a method that can reveal such dependencies. The long experience in sifting educational and contest programs revealed some unnoticed aspects of this problem.

Keywords: program quality measurements, programming productivity, program performance, programming decisions, functional programming.

Development of the Expert System for Building the Architecture of Software Products

Andrey Evgenyevic Grishin, Karen Albertovich Grigorian
121-136
Abstract:

The article is devoted to automation of the software design stage. In the course of the study, the reasons for the high importance of this stage and the relevance of its automation were analyzed. The main stages of this stage were also considered and the existing systems that allow automating each of them were considered. In addition, an own solution was proposed within the framework of the problem of class structure refactoring based on the combinatorial optimization method. A solution method has been developed to improve the quality of the class hierarchy and tested on a real model.

Keywords: automation, design, refactoring, software architecture, OOP, optimization.

Creating a Data Processing Ecosystem for Geological Research

Vitaliy Sergeevich Eremenko, Vera Viktorovna Naumova
336-347
Abstract:

This paper discusses heterogeneous geographically distributed computing systems for processing geological data and approaches to organizing interaction with these systems. The systems are classified by the authors into a number of groups based on the main functional capabilities and technological solutions. A description of the main properties for each type of systems is given, including possible ways for interaction.


An approach is proposed for organizing a single workspace with access to heterogeneous geographically distributed computing systems within the ecosystem developed by the authors. The architecture of the proposed solution and the rules of interaction for its participants are described. A software prototype is demonstrated that implements the described principles on the example of several heterogeneous systems for processing geological information.

Keywords: computing and analytical environment, cloud services, web services, software platforms.

Creating a comparison method for relational tables

Azat Shavkatovich Yakupov, Daniil Andreevich Klinov
173-183
Abstract: The article is devoted to creating a quick method of comparing a huge amount of data tables in relational database management systems. Creating an effective method for comparing relational systems is really relevant today. The study of existing solutions was conducted. The algorithm in this article was created using the probabilistic data structure «Countable Bloom filter» and the Monte Carlo Method. The proposed solution is unique in its direction, as it uses the least amount of temporary resources. A probabilistic model of the created algorithm is constructed, this algorithm can be used for parallelization.
Keywords: multiset, comparison of relational tables, heterogeneous system, Countable Bloom filter, Monte Carlo method, replication, Oracle, PostgreSQL, Probabilistic data structure.

Temperature Distribution at the Border Astenosphere–Lithosphere (Mathematical Model)

Alexander Naumovich Chetyrbotsky
376-401
Abstract:

The convection of matter in the Earth's upper mantle is considered, which in the Oberbeck–Boussinesq approximation is due to thermogravitational differentiation. Within the framework of this approximation, a 2-D numerical simulation of convective flows of the medium matter was performed. The equation for temperature follows from the entropy balance relation, where, due to taking into account the variable viscosity in the system, there is an effect of energy dissipation. The boundary conditions correspond to the assignment of the temperature generally accepted at the boundary of the upper and lower mantles, and for the lateral boundaries - their thermal insulation. At the asthenosphere–lithosphere boundary, assumptions were made that the heat dynamics is determined by its flow from the asthenosphere layer closest to the boundary, part of the heat dissipation along the boundary, and heat consumption for melting the lithosphere matter. Numerical solution of the constitutive equations is carried out in variables stream function - vorticity. An iterative scheme for their solution is given. The issues of software implementation of the numerical simulation apparatus are discussed. It is shown that under such boundary conditions, a quasi-periodic regime of heat oscillations is formed in the system under consideration.

Keywords: asthenosphere, Oberbeck–Boussinesq approximation, mantle convection, boundary conditions, numerical algorithm.

Analysis of Software System Optimization using the Example of Free Automated Library and Information Systems

Oleg Ivanovich Vasyliev, Valentin Yurevich Medvedev
151-163
Abstract:

This article is devoted to the study of the possibilities of optimizing the operability and improving the efficiency of complex multifunctional software systems using the example of free automated library and information systems (hereinafter - ALIS).


By 2023, the world has accumulated valuable experience in the creation and operation of integrated ALIS of various scales and purposes, but the issues of improving their design solutions remain relevant. First of all, this concerns the need to optimize the structure of the source code in order to increase its readability and maintainability, reduce the execution time of individual functional modules, and reduce the amount of RAM used.


As part of the study, a comparative analysis of the source codes of several existing open source databases implemented in various programming languages was carried out. The main approaches to the design of the code structure were studied, the most frequently used algorithms and patterns were identified. To assess the degree of optimization of the source code, a set of indicators was developed, including an assessment of the structure, readability, modularity and other characteristics. On this basis, individual code fragments were compared before and after the use of well-known refactoring techniques.


As a result of the work carried out, it was possible to identify the most common errors and shortcomings in the structuring of the source codes of the ALIS, to determine the main directions of their optimization. Data has been obtained on the possible reduction of testing and technical support costs by improving the quality of source codes.

Keywords: software code correction, software system optimization, refactoring, multilingual system, software system quality assessment, automated library and information systems, software development process.

Automated System for Selecting Optimal Methods for Solving Acoustic Problems Based on Ontology

Irina Leonidovna Artemieva, Alina Evgenevna Chusova
719-737
Abstract:

The report presents the software package that will allow specialists in the field of architectural acoustics to choose the most appropriate methods for modeling sound and selecting finishing materials depending on the tasks and parameters of a building A distinctive feature of this system is the presence of an ontology of the subject area that describes the terms and relationships between concepts, as well as modules for solving various problems in the field of architectural acoustics. Such an approach will allow the user to recommend the most suitable simulation methods for one’s request due to considering the specifics of the premises and the functional requirements of the client. The on-demand software system allows to optimize and parallelize programs written in a domain-specific programming language. The paper describes the principles of source code analysis to identify critical areas and modify them using a bank of patterns. The report also discusses an approach to develop a domain-specific programming language based on domain ontology, ODSL (Ontology-Based Domain-Specific Language), which allows specialists to describe algorithms not accounting for different specific optimization and parallelization methods. The novelty of the work lies in the proposed architecture of modules based on applied ontology, which makes it possible to adapt the solution to other subject areas.

Keywords: ontology, architectural acoustics, optimization, parallelism, ODSL.

Software Framework for Implementing User Interface Interaction in IOS Applications Based on Oculography

Nikita Stanislavovich Afanasev
198-245
Abstract:

Usage of gaze tracking technologies for the purpose of user interface interaction in iOS applications is significantly hampered by the absence of a unified approach to their integration. Current solutions are either strictly limited to their own use-case or made solely for research purposes and thus inapplicable to real-world problems. The focus of this article is the development of a software framework that performs gaze tracking using native technologies and suggests a unified approach to the development of gaze-driven iOS applications.

Keywords: gaze tracking, eye tracking, oculography, gesture recognizers, TrueDepth, ARKit, SceneKit, UIKit, iOS, UX, UI.

Career digital passport based on distributed ledger technology

Айдар Ильдарович Шайфутдинов, Айрат Фаридович Хасьянов
268-286
Abstract:

This paper considers problems associated with the documentation of employment process and management of employment records. Today, these tasks are solved through paper contracts and, in the Russian Federation, through «labor books». In this paper a software solution based on distributed ledger technology (blockchain) and smart contracts is proposed to replace the existing paper workflow.

Keywords: employment contracts, employment records, paper workflow, blockchain, smart contracts, digitalization, decentralized applications, Ethereum, Solidity, IPFS.

Implementation of interactive application for full-dome displays

Руслан Дамирович Ахметшарипов, Влада Владимировна Кугуракова, Мурад Рустэмович Хафизов
166-179
Abstract: This article describes developing of interactive application for full-dome displays with a several users. This work explains few solutions for interaction with a wearable devices or smartphones.
Keywords: fulldome, interactive, multiuser application.

The concept of automatic creation tool for computer game scenario prototype

Гульнара Фаритовна Сахибгареева, Влада Владимировна Кугуракова
235-249
Abstract:

The description of the architecture of the tool for generating a scenario prototype from the text outlined in this paper is described based on the existing solutions.

We formed software requirements and developed prototype of the tool that illustrating the basic principle of the user's work with the application.

Keywords: game scripts, narrative design, scenario prototype, prototyping, game development, immersion, narrative.

Automation of Reading Related Data from Relational and Non-Relational Databases in the Context of using the JPA Standard

Angelina Sergeevna Savincheva, Alexander Andreevich Ferenets
656-678
Abstract:

The process of automating the management of the reading operation of related data from relational and non-relational databases is described.


The developed software tool is based on the use of the JPA (Java Persistence API) standard, which defines the capabilities of managing the lifecycle of entities in Java applications. An architecture for embedding in event processes has been designed, allowing the solution to be integrated into projects regardless of which JPA implementation is used. Support for various data loading strategies, types, and relationship parameters has been implemented. The performance of the tool has been evaluated.

Keywords: JPA, ORM, Java, databases, relational databases, non-relational databases.

Generation of academic groups and project teams based on learners data acquisition

Наталья Александровна Коргутлова, Светлана Юрьевна Басаргина, Михаил Михайлович Абрамский, Марат Альбертович Солнцев, Таисия Сергеевна Бузукина
193-208
Abstract:

The questions of usage of the learners’ data in the solutions for generating student academic groups, electives and project teams are considered. The applications of Machine Learning clustering algorithms for these tasks are illustrated. The opportunity of usage of social network data is shown.

Keywords: personal portrait of student, clustering, competence distribution, social networking analysis.

Automation of Footages Sorting by Screenplay Text for Video Editing

Andrey Dmitrievich Nemanov, Irina Sergeevna Shakhova
533-557
Abstract:

The video editing process involves numerous labor-intensive operations for sorting and preparing footages, requiring significant time investment. This article describes the development of a software solution that uses machine learning technology to automate these processes.


The primary focus is on creating a system capable of classifying and sorting media files according to the screenplay text, thereby increasing the efficiency of material preparation for editing. The system includes modules for speech recognition, audio and video classification, and algorithms for determining screenplay compliance.


Testing showed that the proposed system correctly classifies media files in most cases, significantly reducing rough-cut editing time.

Keywords: video editing, automation, machine learning, speech recognition, audio classification, video classification, coreml, parallel computing, screenplay, soundex, tf-idf, cosine similarity, natural language processing.

Algorithms for Formation of Metadata Mathematical Retro Collections Based on Analysis of Structural Features of Documents

Polina Olegovna Gafurova, Alexander Michailovich Elizarov, Evgeny Konstantinovich Lipachev
238-271
Abstract:

The solutions of the main problems associated with the formation of digital mathematical collections from documents published in the pre-digital period are presented – such collections are designated in the work as retro collections. Algorithms for creating a meta description of retro collections based on the analysis of the structure of mathematical documents and the use of software tools for extracting metadata are given. The description of retro-collections formed using the developed algorithms and included in the metadata factory of the digital mathematical library Lobachevskii-DML is given. The schemes for the formation of metadata and methods for normalizing the extracted metadata in accordance with the schemes and requirements of the integrating mathematical libraries are indicated.

Keywords: Lobachevskii-DML, metadata factory, metadata management services, archive collections.

Automated System for Numerical Similarity Evaluation of Android Applications

Valery Vladimirovich Petrov
336-365
Abstract:

This paper is devoted to the design and development of a system for automating numerical similarity assessment of Android applications. The task of application similarity evaluation is reduced to the similarity evaluation of sets of control flow graphs constructed based on code from classes.dex files of applications. The similarity value was calculated based on the similarity matrix. The algorithms of graph editing and Levenshtein distance were used to compare control flow graphs. Application similarity criteria were formulated and their representation forms were investigated. Types of Android application models and methods of their construction are presented. A prototype of the system for automating the numerical evaluation of Android-applications similarity is developed. Optimization of the software solution is performed with the help of parallel programming tools. Experiments are carried out and the conclusion is made about the ability of the developed system to detect similarities between Android applications.

Keywords: Android application similarity, program similarity, similarity matrix, control flow graph edit distance, similarity matrix visualisation, control flow graph.

Educational analytics and adaptive training using student model in the intellectual learning systems

Михаил Владиславович Каяшев, Денис Юрьевич Макаров, Антон Александрович Марченко
181-192
Abstract:

For support of adaptive training and educational analytics in the intellectual learning systems, it is necessary to collect, process data on progress of the student and his various individual characteristics. It can be realized by means of the student model. The analysis of approaches to modeling of the student has shown that application of several types of models is an optimal solution, considering requirements to the learning system. Three approaches were chosen and united into one model: overlay, Bayesian network, error model. Use of overlay model allows to build individual trajectories of student training. Bayesian networks realize competence-based approach in training. The model of mistakes keeps track of wrong knowledge of the student and helps the student to correct them at early stages. The student model uniting in itself these approaches is more suitable for realization of the personalized training, allows to keep track of progress of the student according to various characteristics and also gives the chance to easily submit the card of subjects, knowledge, competence of the student of various areas in the form of the count that is quite convenient and clear representation.

Keywords: intellectual learning system, student model, competence, adaptive training, educational analytics, overlay model, Bayesian network, domain model.
1 - 22 of 22 items
Information
  • For Readers
  • For Authors
  • For Librarians
Make a Submission
Current Issue
  • Atom logo
  • RSS2 logo
  • RSS1 logo

Russian Digital Libraries Journal

ISSN 1562-5419

Information

  • About the Journal
  • Aims and Scopes
  • Themes
  • Author Guidelines
  • Submissions
  • Privacy Statement
  • Contact
  • eLIBRARY.RU
  • dblp computer science bibliography

Send a manuscript

Authors need to register with the journal prior to submitting or, if already registered, can simply log in and begin the five-step process.

Make a Submission
About this Publishing System

© 2015-2025 Kazan Federal University; Institute of the Information Society