• Main Navigation
  • Main Content
  • Sidebar

Russian Digital Libraries Journal

  • Home
  • About
    • About the Journal
    • Aims and Scopes
    • Themes
    • Editor-in-Chief
    • Editorial Team
    • Submissions
    • Open Access Statement
    • Privacy Statement
    • Contact
  • Current
  • Archives
  • Register
  • Login
  • Search
Published since 1998
ISSN 1562-5419
16+
Language
  • Русский
  • English

Search

Advanced filters

Search Results

Process Approach and Construction of the Database for Non-Core Asset Management in Credit Organizations

Marat Khaidarovich Shakirov
710-753
Abstract:

A method for building end-to-end management accounting in a division of the Bank’s subdevision specializing in working with non-core assets is proposed. Has been proposed the process approach, an algorithm for building a database for the formation of key performance and control indicators.


Has been described the key stages of the department's work, the attribute composition of entities (set) arriving, enriched and transmitted at each stage of the department's work. By modeling the process has been built a role model, access and editing rights for employees. Data sources (reference books) for optimization and unification of the process of filling the database (tuple) are proposed. A method of accessing the database in the Power Query Microsoft Excel add-in is proposed, which allows you to collect data from files of all basic data types, process and refine the received data. In the interactive programming environment Jupyter Notebook, mathematical and financial models for data analysis (logistic regression, decision tree and discounted cash flow method) were built based on data in order to predict costs, the timing of asset exposure and make a decision on the optimal cost of putting property on the Bank's balance sheet and selling price. Based on ready-made libraries (matpotlib, seaborn, plotly), options for data visualization for management are proposed. Using the example of the Bank's division, the author describes the positive effects and opportunities that open up to the management of different levels in solving day-to-day tasks and planning the activities of the division. A technical task was proposed for the development of a showcase for the sale of non-core assets on the Bank's website as an environment for the accumulation of external data for making flexible management decisions.

Keywords: non-core assets, process approach, database, Power Query, data visualization, mathematical and financial methods of data analysis, regression analysis, decision tree, discounted cash flow method.

A Digital Platform for Integration and Analysis of Geophysical Monitoring Data from the Baikal Natural Zone

Andrey Pavlovich Grigoryuk, Lyudmila Petrovna Braginskaya, Igor Konstantinovich Seminskiy, Konstantin Zhanovich Seminskiy, Valeriy Viktorovich Kovalevskiy
303-316
Abstract:

This paper presents a digital platform for complex monitoring data of dangerous geodynamic, engineering-geological and hydrogeological processes occurring in the region of intensive nature management of the central ecological zone of the Baikal natural territory (CEZ BNT). The platform is intended for integration and analysis of data coming from several polygons located within the CEZ BPT in order to assess the state of the geological environment and forecasting of hazardous processes manifestation.


The platform is built on a client-server architecture. Storage, processing and analysis of data is carried out on the server, which users can access via the Internet using a web browser. Several data filtering methods (linear frequency, Savitsky-Goley and others), various methods of spectral and wavelet analysis, multifractal and entropy analysis, and spatial data analysis are currently available. The digital platform has been tested on real data.

Keywords: geophysical monitoring, digital platform, precursors, seismic forecast, earthquakes.

Development a Data Validation Module to Satisfy the Retention Policy Metric

Aigul Ildarovna Sibgatullina, Azat Shavkatovich Yakupov
159-178
Abstract:

Every year the size of the global big data market is growing. Analysing these data is essential for good decision-making. Big data technologies lead to a significant cost reduction with use of cloud services, distributed file systems, when there is a need to store large amounts of information. The quality of data analytics is dependent on the quality of the data themselves. This is especially important if the data has a retention policy and migrates from one source to another, increasing the risk of a data loss. Prevention of negative consequences from data migration is achieved through the process of data reconciliation – a comprehensive verification of large amounts of information in order to confirm their consistency.


This article discusses probabilistic data structures that can be used to solve the problem, and suggests an implementation – data integrity verification module using a Counting Bloom filter. This module is integrated into Apache Airflow to automate its invocation.

Keywords: big data, retention policy, partition, parquet file, Bloom filter.

Methods and Algorithms for Increasing Linked Data Expressiveness (Overview)

Olga Avenirovna Nevzorova
808-834
Abstract: This review discusses methods and algorithms for increasing linked data expressiveness which are prepared for Web publication. The main approaches to the enrichment of ontologies are considered, the methods on which they are based and the tools for implementing the corresponding methods are described.The main stage in the general scheme of the related data life cycle in a cloud of Linked Open Data is the stage of building a set of related RDF- triples. To improve the classification of data and the analysis of their quality, various methods are used to increase the expressiveness of related data. The main ideas of these methods are concerned with the enrichment of existing ontologies (an expansion of the basic scheme of knowledge) by adding or improving terminological axioms. Enrichment methods are based on methods used in various fields, such as knowledge representation, machine learning, statistics, natural language processing, analysis of formal concepts, and game theory.
Keywords: linked data, ontology, ontology enrichment, semantic web.

Optical identification of radio sources of the RCR catalogue with the virtual observatory tools

О.П. Желенкова, Е.К. Майорова, Н.С. Соболева, А.В. Темирова
Abstract: Mass identification of a list of radio sources with the sky surveys of different ranges of the electromagnetic spectrum has undoubted interest to astronomers. Identification of the radio sources is not a straightforward procedure because of the different angular resolution, sensitivity limit, coordinate precision  of the radio catalogues, as well as due to the morphological structure of radio sources themselves.
Keywords: цифровые коллекции, виртуальная обсерватория, исследование радиоисточников, многочастотные обзоры неба, предметно-ориентированные поисковые системы.

Automatic Annotation of Training Datasets in Computer Vision using Machine Learning Methods

Aleksey Konstantinovich Zhuravlev, Karen Albertovich Grigorian
718-729
Abstract:

This paper addresses the issue of automatic annotation of training datasets in the field of computer vision using machine learning methods. Data annotation is a key stage in the development and training of deep learning models, yet the process of creating labeled data often requires significant time and labor. This paper proposes a mechanism for automatic annotation based on the use of convolutional neural networks (CNN) and active learning methods.


The proposed methodology includes the analysis and evaluation of existing approaches to automatic annotation. The effectiveness of the proposed solutions is assessed on publicly available datasets. The results demonstrate that the proposed method significantly reduces the time required for data annotation, although operator intervention is still necessary.


The literature review includes an analysis of modern annotation methods and existing automatic systems, providing a better understanding of the context and advantages of the proposed approach. The conclusion discusses achievements, limitations, and possible directions for future research in this field.

Keywords: computer vision, machine learning, automatic data annotation, training datasets, image segmentation.

V International Conference «Information Technologies in Earth Sciences and Applications for Geology, Mining And Economy. Ites&Mp-2019»

Vera Viktorovna Naumova
1279-1300
Abstract:

The materials presented at the Conference describe the results of recent years in the following areas: Open access to scientific data and knowledge in Earth Sciences; Data peculiarities in Earth Sciences: new concepts and methods, tools for their collection, integration and processing in different information systems, including systems with intensive use of data; Data mining and mathematical simulation of natural processes in Earth Sciences. Evolution of classical GIS-applications in Earth Sciences; Application to Critical Raw Materials (CRM); social aspects of mining (e.g., the Social Licence to Operate [SLO]); predictive mapping and applications to exploration, landuse and search for extensions of known deposits; Intelligent data analysis, elicitation of facts and knowledge from scientific publications. Thesauruses, ontologies and conceptual modeling. Semantic WEB, linked data. Services. Content semantic structuring. Applications for geosciences, e.g., Ontology-based Dynamic Decision Graphs for Expert systems and decision-aid tools; Application of methods and technologies of the remote sensing in Earth Sciences: from satellites to unmanned aerial vehicles; Information technologies for demonstration and popularization of scientific achievements in Earth Sciences; Applications: environmental risks including mining wastes, natural hazards, water resource management, etc.

Keywords: information technology, Earth sciences.

Stability Studies of a Coupled Model to Perturbation of Initial Data

Konstantin Pavlovich Belyaev, Gury Mikhaylovich Mikhaylov, Alexey Nikolaevich Salnikov, Natalia Pavlovna Tuchkova
615-633
Abstract: The stability problem is considered in terms of the classical Lyapunov definition. For this, a set of initial conditions is set, consisting of their preliminary calculations, and the spread of the trajectories obtained as a result of numerical simulation is analyzed. This procedure is implemented as a series of ensemble experiments with a joint MPI-ESM model of the Institute of Meteorology M. Planck (Germany). For numerical modeling, a series of different initial values of the characteristic fields was specified and the model was integrated, starting from each of these fields for different time periods. Extreme ocean level characteristics over a period of 30 years were studied. The statistical distribution was built, the parameters of this distribution were estimated, and the statistical forecast for 5 years in advance was studied. It is shown that the statistical forecast of the level corresponds to the calculated forecast obtained by the model. The localization of extreme level values was studied and an analysis of these results was carried out. Numerical calculations were performed on the Lomonosov-2 supercomputer of Lomonosov Moscow State University.
Keywords: non-linear circulation models, Ensemble numerical experiments, analysis of stability of the model trajectories.

Study results for the detection of matching content using citation analysis

Вадим Николаевич Гуреев, Николай Алексеевич Мазов
322-331
Abstract:

Translated plagiarism has widely spread in a scientific world and posed a serious problem due to the challenges in its automatic detection. However, in the last five years some progress has been observed in this area. The authors of this paper, as well as foreign research team from several universities independently of each other proposed an approach to detect plagiarism based on citation analysis with search of initial source for analyzed suspected paper with the same or similar references. Developed methods of detection of illegal use of borrowed text successfully passed several tests. The report shows the results that we have obtained in the last four years.

Keywords: detection of matching content, translated plagiarism, plagiarism detection, citation analysis, bibliographic database.

Review of existing tools for detecting plagiarism and self-plagiarism

Alina Eduardovna Tlitova, Александр Сергеевич Тощев
143-159
Abstract: All the time scientists need to publish the results of their work in order to remain relevant, meet the time, criteria, and not be outside the scientific community. The well-known principle of “publish or perish” often forces scientists to strive for quantity, not quality [1]. Along with the problems of authorship, paid research, the fabrication of the results, plagiarism and self-plagiarism are among the most common violations. Their impact is more subtle, but no less disruptive for the scientific community.
The article provides an overview of the existing tools for identifying borrowing in the scientific articles of the authors. Decisions’ analysis is performed by com-paring systems for a number of characteristics. The tools are tested on real data to investigate their performance and efficiency.
Keywords: Plagiarism, self-plagiarism, scientific ethics, text borrowing, text analysis.

Data Extraction from Similarly Structured Scanned Documents

Rustem Damirovich Saitgareev, Bulat Rifatovich Giniyatullin, Vladislav Yurievich Toporov, Artur Aleksandrovich Atnagulov, Farid Radikovich Aglyamov
667-688
Abstract:

Currently, the major part of transmitted and stored data is unstructured, and the amount of unstructured data is growing rapidly each year, although it is hardly searchable, unqueryable, and its processing is not automated. At the same time, there is a growth of electronic document management systems. This paper proposes a solution for extracting data from paper documents considering their structure and layout based on document photos. By examining different approaches, including neural networks and plain algorithmic methods, we present their results and discuss them.

Keywords: neural networks, document structure.

Semantic analysis of documents in the control system of digital scientific collections

Шамиль Махмутович Хайдаров
61-85
Abstract: Methods of the semantic documents parsing in digital control system of scientific collections, including electronic journals, offered. The methods of processing documents containing mathematical formulas and methods for the conversion of documents from the OpenXML-format in ТеХ-format considered. The search algorithm for the mathematical formulas in the collections of documents stored in OpenXML-format designed. The algorithm is implemented as online-service on platform science.tatarstan.
Keywords: semantic analysis, publishing systems.

Электронная библиотека знаний для аннотации геномной ДНК

М.П. Пономаренко, Ю.В. Пономаренко, А.С. Фролов, А.В. Кочетов, Ф.А. Колпаков, Н.А. Колчанов, Н.Л. Подколодный
Abstract: Создана электронная библиотека знаний GeneExpress для обеспечения полного цикла аннотации геномной ДНК, включая накопление первичных экспериментальных данных; автоматический анализ этих данных; документирование закономерностей, выявленных в качестве результатов этого анализа; генерацию активных приложений, использующих эти закономерности для аннотации геномной ДНК, и, что является новшеством GeneExpress, объяснение результатов аннотации геномной ДНК вплоть до указания первичных экспериментальных данных, на основании которых были созданы методы получения этих результатов. Таким образом, GeneExpress сочетает в себе поисковые возможности статического информационного ресурса и прогностические возможности активных приложений. Одна из этих возможностей заключается в объяснении результатов аннотации вплоть до первичных экспериментальных данных, с помощью которых были созданы методы получения этих результатов; вторая - в комплексном анализе расшифрованных фрагментов ДНК путем сопоставления результатов распознавания функциональных сайтов, координированная работа которых регулирует экспрессию генов. Реализация и применение этих новых возможностей демонстрируются на примере распознавания функциональных сайтов, предсказания их биологической активности и предсказания "высокого/низкого" уровня экспрессии генов. Электронная библиотека знаний GeneExpress является общедоступной через Интернет, http://wwwmgs.bionet.nsc.ru/systems/GeneExpress/.

Graduation thesis: intellectual property, the source of personal data. legal problems when checking and using

Павел Петрович Гейко
305-321
Abstract:

This work addresses some of the legal issues emerging in connection with the necessary to implement mandatory testing of final qualifying works for borrowings at realization final attestation on educational programs of higher education, posting these works in electronic library systems educational organizations, but also their use. In particular, the need to comply with legislation on personal data when processing personal information in the course of audit work identifying plagiarism, publishing works in library systems. Paid attention to the enforcement of intellectual property rights of authors of final qualifying works during educational organizations assigned to them defined responsibilities. Analysis of legal issues carried out taking into account introduced by the Government of the Russian Federation the draft law on education the organization of higher education it is proposed to impose a duty on public access on the official website of the educational organization in the Internet the full texts of final qualification works of master's programs and specialist programs.

Keywords: originality, uniqueness, identifying plagiarism, plagiarism, personal data, intellectual property, exclusive rights, qualifying work, educational work (educational programs), scientific work, author, posting, publishing, electronic library, education.

Analysis of spatial data in distributed environments

Е.В. Шулькин, С.М. Краснопеев
Abstract: The article considers the issue of algorithms for analysis of spatial data in distributed environments based on the Open Geospatial Consortium standards. Publication of source code is also briefly touched. Emphasis is placed on our understanding of what do we need from client-side of Web-services of spatial data analysis and how the collaboration between user and published data analysis tools can be organized. Preliminary results demonstrates the development of universal client module for data analysis
Keywords: analysis of spatial data, the Open Geospatial Consortium standards, source code, Web-services of spatial data, universal client module for data analysis.

Building Subject Domain Ontology on the Base of a Logical Data Mod

Alexander M. Gusenkov, Naille R. Bukharaev, Evgeny V. Biryaltsev
390-417
Abstract: The technology of automated construction of the subject domain ontology, based on information extracted from the comments of the TATNEFT oil company relational databases, is considered. The technology is based on building a converter (compiler) translating the logical data model of Epicenter Petrotechnical Open Software Corporation (POSC), presented in the form of ER diagrams and a set of the EXPRESS object-oriented language descriptions, into the OWL ontology description language, recommended by the W3C consortium. The basic syntactic and semantic aspects of the transformation are described.
Keywords: subject domain ontology, relational databases, POSC, OWL.

Methodology of Network Analysis of Scientific Publications

Inna Gennadevna Olgina
646-672
Abstract:

The relevance of the issues of the analysis of scientific publications is due to the fact that with the of Internet technologies, it became possible to collect data on the publication citation network. Meanwhile, the current approach to the analysis of scientific publications is based on bibliometric indicators that take into account only the number of citations. However, network analysis, which is mainly used in the study of social networks, is becoming increasingly widely used. The author has developed a methodology that allows for an effective analysis of scientific publications based on network analysis methods alternative to bibliometric methods. As criteria for evaluating scientific publications based on network analysis, relevant measures of the centrality of the citation network nodes are established: centrality by degree of connectivity; centrality by proximity to other nodes; centrality by mediation; centrality by authority; centrality by concentration. The author presented the experiment result that allows validating the developed methodology of network analysis of the scientific publications significance. Scientometric databases were used as primary sources of data on publications, which make it possible to track the citation of publications and identify relevant citation networks. The application of the proposed network analysis methodology contributes to the identification of important publications in the development of the scientific direction.

Keywords: citation network, publications, scientometrics, bibliometric analysis, network analysis, graph.

Towards Virtual Data Centres for Remote Sensing

Е.Б. Кудашев, М.А. Попов
Abstract: Remote Sensing from satellites allow a global perspective on observations of the Earth to be developed. This paper gives an overview of some of the international initiatives that have been created to improve the exploitation of remotely sensed data for environmental studies. The focus is on the activities and scientific challenges facing GEO/GEOSS on Earth Observation. Other relevant international initiatives are also presented, such as CEOS, GMES and APARSEN. The benefits of creating a Virtual Centre of Remote Sensing Data in are also discussed.
Keywords: Remote Sensing, Infrastructure for Scientific Information Resources, GeoPortal, CEOS - Committee on Earth Observation Satellite, GMES - Global Monitoring for Environment and Security, APARSEN - Alliance for Permanent Access to Records of Science.

О реализации веб-системы математической информации

А.С. Аджиев, А.Н. Бездушный, В.А. Серебряков
Abstract: На основе проведенного ранее анализа российских математических электронных ресурсов, а так же опыта зарубежных математических информационных систем описан проект создаваемой математической информационной системы Math-Net.RU. Базовой платформой системы Math-Net.RU является универсальная информационная система ИСИР.
Проект описан в терминах перечня требований и условий, которым должна удовлетворять создаваемая система. Рассмотрены и проанализированы альтернативные варианты реализации различных компонент системы, а также пути решения возникающих при этом проблем. Очерчены категории хранимой информации, целевой круг пользователей системы и требуемая функциональность. Описана общая архитектура, схема данных, пользовательские интерфейсы, а также способы наполнения системы информацией, актуализации и синхронизации данных из других информационных систем и баз данных. Рассмотрены проблемы представления математических текстов и формул в информационных системах, дан сравнительный анализ существующих форматов хранения. Очерчены так же перспективы участия системы Math-Net.RU в создаваемой Всемирной математической информационной системе Math-Net, а также требования к системе-участнику.

Statistical Analysis of Observation Data of Air-Sea Interaction in the North Atlantic

Natalia Pavlovna Tuchkova, Konstantin Pavlovich Belyaev, Gury Mickailovich Mickailov
122-133
Abstract:

The observational data for 1979-2018 in the North Atlantic region are analyzed. These data were obtained as a result of the implementation of the project of the Russian Academy of Sciences for the study of the atmosphere in the North Atlantic (RAS-NAAD). The dataset provides many surface and free atmosphere parameters based on the sigma model and meets the many requirements of meteorologists, climatologists and oceanographers working in both research and operational fields. The paper analyzes the seasonal and long-term variability of the field of heat fluxes and water surface temperature in the North Atlantic. Schemes for analyzing diffusion processes were used as the main research method. Based on the given series of 40 years in length from 1979 to 2018, such parameters of diffusion processes as the mean (process drift) and variance (process diffusion) were calculated and their maps and time curves were constructed. Numerical calculations realized on the Lomonosov-2 supercomputer of the Lomonosov Moscow State University.

Keywords: UDC 519.6, UDC 519.2.

Image Classification using Convolutional Neural Networks

Sergey Alekseevich Filippov
366-382
Abstract:

Nowadays, many different tools can be used to classify images, each of which is aimed at solving a certain range of tasks. This article provides a brief overview of libraries and technologies for image classification. The architecture of a simple convolutional neural network for image classification is built. Image recognition experiments have been conducted with popular neural networks such as VGG 16 and ResNet 50. Both neural networks have shown good results. However, ResNet 50 overfitted due to the fact that the dataset contained the same type of images for training, since this neural network has more layers that allow reading the attributes of objects in the images. A comparative analysis of image recognition specially prepared for this experiment was carried out with the trained models.

Keywords: image recognition, neural network, convolutional neural network, image classification, machine learning.

Описание и использование тезаурусов в информационных системах, подходы и реализация

М.Х. Нгуен, А.С. Аджиев
Abstract: В статье рассмотрены разные подходы к формализации тезаурусов, а также стандарты ISO, ANSI и ГОСТ. Сделан анализ некоторых возможных платформ для такой формализации, описаны особенности работы с тезаурусами в информационных системах, а также проблемы при этом возникающие, требования к реализации тезауруса в рамках SemanticWeb [12].
Рассмотрены особенности и различия классификаторов ресурсов и обычных терминологических и лингвистических тезаурусов. Дан сравнительный анализ существующих схем данных и подходов к реализации тезаурусов для информационных систем на основе RDF. Рассмотрены также вопросы организации пользовательских интерфейсов для работы с тезаурусами, и использования их при поиске в информационной системе, а также интерфейсы администрирования тезаурусов.
Во второй части статьи на основании проделанного анализа сформулированы требования к описанию тезауруса в ИСИР, и приведена общая универсальная схема данных для представления тезауруса в этой информационной системе, удовлетворяющая перечисленным требованиям, и небольшой пример реализации в ней классификатора MSC.
На основании предложенной общей универсальной схемы и сформулированных требований описана реализация тезауруса в ИСИР.

Support System for the Selection of Information Sources in Citation Networks

Inna Gennadevna Olgina
76-96
Abstract:

With the advent of network science, it has become possible to explore complex network systems, including social and information networks, by presenting them as graph models. The exponential growth of the total volume of scientific publications determines the relevance of the tasks of analyzing their interrelations. In network science, models and methods related to the field of so-called citation networks are being developed to solve these problems. However, network metrics are not used when analyzing publications in citation databases. The paper considers the issues of creating a decision support system for the selection of information sources based on data on the citation of scientific publications. A software package has been developed for making decisions on determining an important publication in a certain thematic area. The software package is based on a method of ranking publications by importance based on the analysis of citation networks, which allows you to identify publications that do not clearly stand out when ranking based on known bibliometric indicators or known measures of centrality of nodes in their pure form. A study and comparative analysis of software for visualization and research of all types of graphs and social networks has been conducted. Studies have been carried out confirming the effectiveness of the proposed decision support system in the selection of information sources.

Keywords: citation network, publication, scientometry, decision support system, software architecture, network analysis, graph.

Use of REST API and WebSocket Interfaces Algorithms for Structuring the Three-Link Level of Emergent Systems and Displaying Media Systems

Mikhail Mikhailovich Blagirev, Alexey Olegovich Kostyrenkov
415-428
Abstract:

An analysis of the speed and efficiency of data transfer using the WebSocket and REST API protocols was carried out. To compare the speed of processing stream objects and identify a more reliable technology for developing APIs, expansions of basic functions in Taylor and Fourier series were used. As a result, it was revealed that the REST API is a faster and more accessible resource for transmitting information data in a bitwise transformation, and the scalability of this protocol prevails in the number of processed units, which allows expanding the number of tests performed.

Keywords: scalability, logging, structuring, REST API, WebSocket.

Analysing Machine Learning Models based on Explainable Artificial Intelligence Methods in Educational Analytics

Dmitriy Arturovich Minullin, Fail Mubarakovich Gafarov
294-315
Abstract:

The problem of predicting early dropout of students of Russian universities is urgent and therefore requires the development of new innovative approaches to solve it. To solve this problem, it is possible to develop predictive systems based on the use of student data, available in the information systems of universities. This paper investigates machine learning models for predicting early student dropout trained on the basis of student characteristics and performance data. The main scientific novelty of the work lies in the use of explainable AI methods to interpret and explain the performance of the trained machine learning models. The Explainable AI methods allow us to understand which of the input features (student characteristics) have the greatest influence on the results of the machine learning models. (student characteristics) have the greatest influence on the prediction results of trained models, and can also help to understand why the models make certain decisions. The findings expand the understanding of the influence of various factors on early dropout of students.

Keywords: educational analytics, data mining, machine learning, explainable AI.
1 - 25 of 35 items 1 2 > >> 
Information
  • For Readers
  • For Authors
  • For Librarians
Make a Submission
Current Issue
  • Atom logo
  • RSS2 logo
  • RSS1 logo

Russian Digital Libraries Journal

ISSN 1562-5419

Information

  • About the Journal
  • Aims and Scopes
  • Themes
  • Author Guidelines
  • Submissions
  • Privacy Statement
  • Contact
  • eLIBRARY.RU
  • dblp computer science bibliography

Send a manuscript

Authors need to register with the journal prior to submitting or, if already registered, can simply log in and begin the five-step process.

Make a Submission
About this Publishing System

© 2015-2025 Kazan Federal University; Institute of the Information Society