• Main Navigation
  • Main Content
  • Sidebar

Russian Digital Libraries Journal

  • Home
  • About
    • About the Journal
    • Aims and Scopes
    • Themes
    • Editor-in-Chief
    • Editorial Team
    • Submissions
    • Open Access Statement
    • Privacy Statement
    • Contact
  • Current
  • Archives
  • Register
  • Login
  • Search
Published since 1998
ISSN 1562-5419
16+
Language
  • Русский
  • English

Search

Advanced filters

Search Results

Science Data Infrastructure for Access to Earth Observation Satellite Data

Е.Б. Кудашев
Abstract: Virtual research centre of digital preservation in Europe provides a natural basis for long-term consolidation of digital preservation research and expertise. Spatial Data Infrastructure will cover technical methods for preservation, access and most importantly re-use of data holdings over the whole lifecycle; legal and economic issues including costs and governance issues as well as digital rights; and outreach within and outside the consortium to help to create a discipline of data curators with appropriate qualifications. Main tasks of Spatial Data Infrastructure SDI development are building global infrastructure for IT and geodata; satellite information harmonization; usage of agreed upon set of standards; clear documentation describing the parts of the system; interoperability between independently created applications and databases; common standards within their interfaces, protocols and data formats; and finally support of a general data policy for data creation, access, and support of satellite information. Fundamental principle of Russian segment of SDI is providing interoperability – the ability of interaction for heterogeneous services and data catalogues within the bounds of a unified informational system. The Russian segment of distributed informational system has been built on the basis of EOLI-XML and SSE technologies.
Keywords: Science Data Infrastructure, e-Science, Earth Observation data, Scientific e-Infrastructure, Open Data Infrastructure, Data management.

Basic Services of Factory Metadata Digital Mathematical Library Lobachevskii-Dml

Polina Gafurova, Alexander Elizarov, Evgeny Konstantinovich Lipachev
336-381
Abstract: A number of problems related to the construction of the metadata factory of the digital mathematical library Lobachevskii-DML have been solved. By metadata factory we mean a system of interconnected software tools aimed at creating, processing, storing and managing metadata of digital library objects and allowing integrating created electronic collections into aggregating digital scientific libraries. In order to select the optimal such software tools from existing ones and their modernization:we discussed the features of the presentation of the metadata of documents of various electronic collections related both to the formats used and to changes in the composition and completeness of the set of metadata throughout the entire publication of the corresponding scientific journal;we presented and characterized software tools for managing scientific content and methods for organizing automated integration of repositories of mathematical documents with other information systems;we discussed such an important function of the digital library metadata factory as the normalization of metadata in accordance with the formats of other aggregating libraries.As a result of the development of the metadata factory of the digital mathematical library Lobachevskii-DML, we proposed a system of services for the automated generation of metadata for electronic mathematical collections; we have developed an xml metadata presentation language based on the Journal Archiving and Interchange Tag Suite (NISO JATS); we have created software tools for normalizing metadata of electronic collections of scientific documents in formats developed by international organizations – aggregators of resources in mathematics and Computer Science; we have developed an algorithm for converting metadata to oai_dc format and generating the archive structure for import into DSpace digital storage; we have proposed and implemented methods for integrating electronic mathematical collections of Kazan University into domestic and foreign digital mathematical libraries.
Keywords: digital libraries, digital mathematical library, metadata generation, metadata extraction, metadata normalization, metadata factory, NISO JATS, semantic relationships, Lobachevskii-DML.

Automatic Replenishment of Metadata of Digital Publications using Semantic Services of the Internet

Polina Olegovna Gafurova
164-186
Abstract:

The article describes approaches to replenishing metadata of documents in electronic collections of a digital mathematical library. An open resource of the semantic network is used as a replenishment. For this purpose, software tools have been developed to search for the necessary data and include it in a metadata set. A separate block of metadata in a scientific article is formed from the affiliation of the authors presented in the document. Typically, the ownership that occurs in a document does not contain sufficient data to generate a set of metadata. A method has been developed for providing author affiliation metadata, providing an open register of scientific organization identifiers (ROR), as well as means for making connections between ROR and other semantic chains. This method was applied to the collections of articles of the journal “Digital Libraries” for 2021–2022.


The article describes a method for connecting the Lobachevsky digital mathematical library-DML to new electronic collections, and describes a method for transforming metadata into a digital format available for downloading.

Keywords: ROR, Wikidata, digital libraries, affiliation metadata, Lobachevskii-DML.

Process Approach and Construction of the Database for Non-Core Asset Management in Credit Organizations

Marat Khaidarovich Shakirov
710-753
Abstract:

A method for building end-to-end management accounting in a division of the Bank’s subdevision specializing in working with non-core assets is proposed. Has been proposed the process approach, an algorithm for building a database for the formation of key performance and control indicators.


Has been described the key stages of the department's work, the attribute composition of entities (set) arriving, enriched and transmitted at each stage of the department's work. By modeling the process has been built a role model, access and editing rights for employees. Data sources (reference books) for optimization and unification of the process of filling the database (tuple) are proposed. A method of accessing the database in the Power Query Microsoft Excel add-in is proposed, which allows you to collect data from files of all basic data types, process and refine the received data. In the interactive programming environment Jupyter Notebook, mathematical and financial models for data analysis (logistic regression, decision tree and discounted cash flow method) were built based on data in order to predict costs, the timing of asset exposure and make a decision on the optimal cost of putting property on the Bank's balance sheet and selling price. Based on ready-made libraries (matpotlib, seaborn, plotly), options for data visualization for management are proposed. Using the example of the Bank's division, the author describes the positive effects and opportunities that open up to the management of different levels in solving day-to-day tasks and planning the activities of the division. A technical task was proposed for the development of a showcase for the sale of non-core assets on the Bank's website as an environment for the accumulation of external data for making flexible management decisions.

Keywords: non-core assets, process approach, database, Power Query, data visualization, mathematical and financial methods of data analysis, regression analysis, decision tree, discounted cash flow method.

Procedure for Comparing Text Recognition Software Solutions For Scientific Publications by the Quality of Metadata Extraction

Ilia Igorevich Kuznetsov , Oleg Panteleevich Novikov, Dmitry Yurievich ILIN
654-680
Abstract:

Metadata of scientific publications are used to build catalogs, determine the citation of publications, and perform other tasks. Automation of metadata extraction from PDF files provides means to speed up the execution of the designated tasks, while the possibility of further use of the obtained data depends on the quality of extraction. Existing software solutions were analyzed, after which three of them were selected: GROBID, CERMINE, ScientificPdfParser. A procedure for comparing software solutions for recognizing texts of scientific publications by the quality of metadata extraction is proposed. Based on the procedure, an experiment was conducted to extract 4 types of metadata (title, abstract, publication date, author names). To compare software solutions, a dataset of 112,457 publications divided into 23 subject areas formed on the basis of Semantic Scholar data was used. An example of choosing an effective software solution for metadata extraction under the conditions of specified priorities for subject areas and types of metadata using a weighted sum is given. It was determined that for the given example CERMINE shows efficiency 10.5% higher than GROBID and 9.6% higher than ScientificPdfParser.

Keywords: text recognition, scientific publications, metadata, data extraction quality, procedure.

The SAO RAS archive system. Maintenance and upgrading

О.П. Желенкова, В.В. Витковский, Т.А. Пляскина
Abstract: The observatory archive system includes a digital data storage and a search information system (SIS) with a dynamic web-based interface and http data access. To date, the system includes 16 digital collections of observational data (local archives), obtained on different instruments, operating or working on the telescopes. The earliest data refer to the end of 1994. There are currently actively replenished 6 local archives. The data storage includes a temporary storage area, located on a file server of the 6-m telescope (BTA), and the area of permanent storage. Permanent storage area includes CD / DVD-discs, a hard disk of the dedicated archive server and also a large capacity flash disk. For data protection during emergency situations or I/O defects of disks, we provide two full copies of the CD / DVD disks and two copies of the archive data on the hard disk of the archive server. One copy (A0) repeats optical discs, the other (A1), with the slightly modified directory structure, actually used by the SIS. D igital media devices and read-write data drives can not be attributed to long-term storage devices. For long-term storage of digital data it is necessary to provide rewriting of information every 5-10 years on a new type of media. Archive copies A0 and A1 are also supported for this procedure of rewriting. Archival data (A1) is repeated on a flash disk with the addition of a dump of tables and programs. There is the system backup for restoring after an emergency on the server. To ensure the modernization of the SIS, we support the two schemes of the database - test and operational. All our developments take place in the test database schema. When modifying the scheme after its checking the SIS switched to an updated version of the database. The original copy of the A0 and the availability of test database scheme allow to modernize the SIS, even at the level of the tables. Currently, the SIS implemented on DBMS PostgreSQL 8.3.7.
Keywords: цифровые коллекции экспериментальных данных, веб-доступ к архивам наблюдений, виртуальная обсерватория, предметно-ориентированные базы данных.

Development a Data Validation Module to Satisfy the Retention Policy Metric

Aigul Ildarovna Sibgatullina, Azat Shavkatovich Yakupov
159-178
Abstract:

Every year the size of the global big data market is growing. Analysing these data is essential for good decision-making. Big data technologies lead to a significant cost reduction with use of cloud services, distributed file systems, when there is a need to store large amounts of information. The quality of data analytics is dependent on the quality of the data themselves. This is especially important if the data has a retention policy and migrates from one source to another, increasing the risk of a data loss. Prevention of negative consequences from data migration is achieved through the process of data reconciliation – a comprehensive verification of large amounts of information in order to confirm their consistency.


This article discusses probabilistic data structures that can be used to solve the problem, and suggests an implementation – data integrity verification module using a Counting Bloom filter. This module is integrated into Apache Airflow to automate its invocation.

Keywords: big data, retention policy, partition, parquet file, Bloom filter.

Electronic Database on Experimental Bond Dissociation Energies of Organic Compounds

Vladimir Evgen’evicn Tumanov, Andrey Ivanovich Prokhorov
1203-1216
Abstract:

The presented web database on experimental homolytic bond dissociation energies in organic compounds is intended for use by a wide range of theoreticians and practitioners in free access. The paper provides a brief overview of the sources of the dissociation energies of bonds of organic molecules, which are calculated theoretically, measured experimentally and estimated from kinetic and thermochemical experimental data, their presentation in the Internet database. A web database on homolytic bond dissociation energies of organic compounds is presented. The reported bond dissociation energies are calculated from experimental kinetic and thermochemical data. Descriptions of experimental data sources, classes of organic compounds and calculation methods are given. The logical structure of the database and the description of the main fields of its tables are given. The main search form of the database interface is presented and an example of a search result for a specific organic compound is given. Bond dissociation energies are calculated at a temperature of 298.15 K, which is usually absent in most sources. The analogs of the present base are inferior to the latter in taking into account temperature correlations. Currently, work is underway to analyze and analyze the published data taking into account the entropy effects.

Keywords: electronic directory, organic compounds, bond dissociation energy, database, internet.

A Digital Platform for Integration and Analysis of Geophysical Monitoring Data from the Baikal Natural Zone

Andrey Pavlovich Grigoryuk, Lyudmila Petrovna Braginskaya, Igor Konstantinovich Seminskiy, Konstantin Zhanovich Seminskiy, Valeriy Viktorovich Kovalevskiy
303-316
Abstract:

This paper presents a digital platform for complex monitoring data of dangerous geodynamic, engineering-geological and hydrogeological processes occurring in the region of intensive nature management of the central ecological zone of the Baikal natural territory (CEZ BNT). The platform is intended for integration and analysis of data coming from several polygons located within the CEZ BPT in order to assess the state of the geological environment and forecasting of hazardous processes manifestation.


The platform is built on a client-server architecture. Storage, processing and analysis of data is carried out on the server, which users can access via the Internet using a web browser. Several data filtering methods (linear frequency, Savitsky-Goley and others), various methods of spectral and wavelet analysis, multifractal and entropy analysis, and spatial data analysis are currently available. The digital platform has been tested on real data.

Keywords: geophysical monitoring, digital platform, precursors, seismic forecast, earthquakes.

Tool for Sequential Snapshotting of Aggregated Data from Streaming Data

Artem Igorevich Gurianov, Azat Shavkatovich Yakupov
414-436
Abstract:

n the modern world, streaming data has become widespread in many subject areas. The task of processing streaming data in real time, with minimal delay, is highly relevant.


In stream processing, data processing, various approximate algorithms are often used, which have much higher time and memory efficiency than exact algorithms. In addition, there is often a need to forecast the state of the stream.


Thus, there is currently a need for a tool for sequential snapshotting of aggregated data from streaming data, enabling flow state prediction and approximate algorithms for stream data processing.


The authors of the article have developed such a tool, reviewed its architecture and mechanism of functioning, and evaluated the prospects for its further development.

Keywords: streaming data, stream processing, stream analysis, materialized views, streaming algorithms, approximate algorithms, stream forecasting.

Solving the Problem of Classifying the Emotional Tone of a Message with Determining the Most Appropriate Neural Network Architecture

Danis Ilmasovich Bagautdinov, Salman Salman, Vladislav Alekseevich Alekseev, Rustamdzhon Murodzhonovich Usmonov
396-413
Abstract:

To determine the most effective approach for solving the task of classifying the emotional tone of a message, we trained selected neural network models on various sets of training data. Next, based on the performance metric of the percentage of correctly classified responses on a test data set, we compared combinations of training data sets and various models trained on them. During the writing of this article, we trained four neural network models on three different sets of training data. By comparing the accuracy of the responses from each model trained on different training data sets, conclusions were drawn regarding the neural network model best suited for solving the task at hand.

Keywords: NLP, sentiment detection, neural networks, comparison of neural network models, LSTM, CNN, BiLSTM.

Computed knowledge base for descrbing information resources in molecular spectroscopy. 5. Expert data quality

А.Ю. Ахлёстин, Н.А. Лаврентьев, А.И. Привезенцев, А.З. Фазлиев
Abstract: It is shown that trust in the content of information resources can be assessed by means of a publishing criterion, with the information recourses being of the trusted and distrusted type. The task of assessment of trust consists of four subtasks: (1) building multisets of physical quantities available in primary data sources, (2) alignment of values of physical quantities, (3) formulation of quantitative restrictions for publishing criterion in different ranges of change of physical quantities, and (4) decomposition of expert data. Spectral data publishing criteria and restrictions required for solving data alignment tasks are outlined. Alignment results have been tabulated. Using vacuum wavenumbers as an example, restrictions inherent in publishing criteria are formulated. The assessment of the content trust obtained from solutions to the tasks of decomposition f expert data are presented as the OWL-ontologies. Building knowledge bases of this kind at virtual data centers intended for data intensive science will provide realization of an automatic selection of spectroscopic information resources exhibiting a high degree of trust.
Keywords: quantitative spectroscopy, data alignment, content trust, publishing criterion.

Semantic analysis of documents in the control system of digital scientific collections

Шамиль Махмутович Хайдаров
61-85
Abstract: Methods of the semantic documents parsing in digital control system of scientific collections, including electronic journals, offered. The methods of processing documents containing mathematical formulas and methods for the conversion of documents from the OpenXML-format in ТеХ-format considered. The search algorithm for the mathematical formulas in the collections of documents stored in OpenXML-format designed. The algorithm is implemented as online-service on platform science.tatarstan.
Keywords: semantic analysis, publishing systems.

Application of Synthetic Data to the Problem of Anomaly Detection in the Field of Information Security

Artem Igorevich Gurianov
187-200
Abstract:

Currently, synthetic data is highly relevant in machine learning. Modern synthetic data generation algorithms make it possible to generate data that is very similar in statistical properties to the original data. Synthetic data is used in practice in a wide range of tasks, including those related to data augmentation.


The author of the article proposes a data augmentation method that combines the approaches of increasing the sample size using synthetic data and synthetic anomaly generation. This method has been used to solve an information security problem of anomaly detection in server logs in order to detect attacks.


The model trained for the task shows high results. This demonstrates the effectiveness of using synthetic data to increase sample size and generate anomalies, as well as the ability to use these approaches together with high efficiency.

Keywords: synthetic data, anomaly detection, information security, anomaly generation, data augmentation, machine learning.

New Method of Description of Eddy-Covariance Ecologic Data

Raoul Rashidovich Nigmatullin, Alexander Alekseevich Litvinov, Sergey Igorevich Osokin
41-75
Abstract:

In this paper, the authors propose the foundations of an original theory of quasi-reproducible experiments (QRE) based on the testable hypothesis that there exists an essential correlation (memory) between successive measurements. Based on this hypothesis, which the authors define for brevity as the verified partial correlation principle (VPCP), it can be proved that there exists a universal fitting function (UFF) for quasi-reproducible (QR) measurements. In other words, there is some common platform or "bridge" on which, figuratively speaking, a true theory (claiming to describe data from first principles or verifiable models) and an experiment offering this theory for verification measured data, maximally "cleaned" from the influence of uncontrollable factors and apparatus/software function, meet. Actually, the proposed theory gives a potential researcher the method of purification of initial data and finally suggests the curve that periodic and cleaned from a set of uncontrollable factors. The final curve corresponds to an ideal experiment.


The proposed theory has been tested on eddy covariance ecologic data related to the content of CH4, CO2 and water vapors of H2O in the local atmosphere where the corresponding detectors for measuring of the desired gases content are located.


For these tested eddy covariance data associated with the presence in atmosphere two gases CH4, CO2 and H2O vapors there is no simple hypothesis containing a minimal number of the fitting parameters, and, therefore, the fitting function that follows from this theory can serve as the only and reliable quantitative description of this kind of data belonging to the tested complex system. We should note also that the final fitting function removed from uncontrollable factors becomes pure periodic and corresponds to an ideal experiment. Applications of this theory to practical applications, the place of this theory among other alternative approaches, (especially touching the professional interests of ecologists) and its further development are discussed in the paper.


The paper examines the phenomenon of joint creativity of several authors, and provides examples from various fields of activity. The main attention is paid to information technologies: inventions made at the end of the 20th century are analyzed. Their authors are pairs of outstanding specialists who combined the talents of a programmer and a manager. They determined the further development of the IT industry and radically changed the quality of mankind’s way of life. The stories of the emergence of famous computers, operating systems, the World Wide Web, and network navigation tools are briefly described.

Keywords: quasi-reproducible experiments, complex systems, verified partial correlation principle, universal fitting function, quasi-periodic measurements, quasi-reproducible measurements, memory effects, eddy covariance.

Предложения по наборам метаданных для научных информационных ресурсов ЕНИП РАН

А.А. Бездушный, А.Н. Бездушный, А.К. Нестеренко, В.А. Серебряков, Т.М. Сысоев
Abstract: Рассматриваются вопросы формирования наборов элементов метаданных и онтологий для научных информационных ресурсов РАН в рамках проекта Единого Научного Информационного Пространства (ЕНИП) РАН. Рассматриваются потребности, цели и задачи организации ЕНИП РАН, как среды взаимосвязанных распределённых гетерогенных систем. Даётся представление о предметных областях и типах ресурсов, информацию о которых планируется представлять в ЕНИП. Описывается методика, используемая для описания схем метаданных, приводится список проанализированных стандартов и предложений по схемам метаданных, использованных при разработке схем ЕНИП. Рассматриваются примеры схем метаданных ЕНИП и XML-представления данных для обмена ими в рамках ЕНИП.

Digital 3D-Objects Visualization in Forming Virtual Exhibitions

Nikolay Evgenvich Kalenov, Sergey Alexandrovich Kirillov, Irina Nikolaevna Sobolevskaya, Aleksandr Nikolaevich Sotnikov
418-432
Abstract: The paper is presents approaches to solving the problem of creating realistic interactive 3D web-collections of museum exhibits. The presentation of 3D-models of objects based on oriented polygonal structures is considered. The method of creating a virtual collection of 3D-models using interactive animation technology is described. It is also shown how a full-fledged 3D-model is constructed on the basis of individual exposure frames using photogrammetry methods. The paper assesses the computational complexity of constructing realistic 3D-models. For the creation of 3D-models in order to provide them to a wide range of users via the Internet, the so-called interactive animation technology is used. The paper presents the differences between the representations of full-fledged 3D-models and 3D-models presented in the form of interactive multiplication. The technology of creating 3D-models of objects from the funds of the State Biological Museum named K.A Timiryazev and the formation on their basis of the digital library “Scientific Heritage of Russia” of a virtual exhibition dedicated to the scientific activities of M.M. Gerasimov and his anthropological reconstructions, and vividly demonstrating the possibility of integrating information resources by means of an electronic library. The format of virtual exhibitions allows you to combine the resources of partners to provide a wide range of users with collections stored in museum, archival and library collections.
Keywords: photogrammetry, 3D-modeling, interactive animation, web-design, polygonal modeling.

Methods and Algorithms for Increasing Linked Data Expressiveness (Overview)

Olga Avenirovna Nevzorova
808-834
Abstract: This review discusses methods and algorithms for increasing linked data expressiveness which are prepared for Web publication. The main approaches to the enrichment of ontologies are considered, the methods on which they are based and the tools for implementing the corresponding methods are described.The main stage in the general scheme of the related data life cycle in a cloud of Linked Open Data is the stage of building a set of related RDF- triples. To improve the classification of data and the analysis of their quality, various methods are used to increase the expressiveness of related data. The main ideas of these methods are concerned with the enrichment of existing ontologies (an expansion of the basic scheme of knowledge) by adding or improving terminological axioms. Enrichment methods are based on methods used in various fields, such as knowledge representation, machine learning, statistics, natural language processing, analysis of formal concepts, and game theory.
Keywords: linked data, ontology, ontology enrichment, semantic web.

База данных RePEc и ее российский партнер Socionet

Т. Крихель, С. Паринов
Abstract: Онлайновая экономическая библиотека RePEc.org занимает второе место в мире (после arXiv.org) по количеству бесплатно предлагаемых научных материалов. У RePEc совсем другая модель организации пополнения базы данных по сравнению с arXiv и совсем другое содержание предоставляемой информации. Предлагаемая статья посвящена описанию этих особенностей.
Модель организации RePEc имеет архитектуру открытого типа. База данных открыта в двух смыслах: 1)для пополнения (организации могут помещать в нее свои материалы), а так же для развития (разработчики могут создаваться различные сервисы для пользователей). Библиотеки традиционного типа – включая многие электронные библиотеки – закрыты в обоих направлениях. В данной статье также обсуждается особенность функционального соединения с RePEc с системой Соционет.
Что касается содержания предоставляемой информации, база данных RePEc нацелена на создание реляционного набора данных о научных ресурсах, а также связанных с ними сведений. Это должно включать данные обо всех авторах, статьях и организаций, имеющих прямое отношение к исследованиям по экономике. Подобный амбициозный проект может быть реализован только в случае, если затраты на сбор данных будут децентрализованы и малы, а также если выгоды от использования данной информации будут достаточно велики.

INSPIRE Infrastructure Build-up in Estonia

М.Я. Теэ, Т.Т. Ильвес
Abstract: This paper concentrates on the methodics of how to comply to INSPIRE requirements in conditions where the project is under time and budgetary pressure. The goal was to achieve discovery and view services for the spatial data and according metadata that is described in INSPIRE directive Appendix 1 and 2 (INSPIRE). Also to create Estonian GeoPortal with its subpages and administrative tools that can be used to maintain and add spatial data and metadata.
Keywords: ESRI ArcGIS for INSPIRE, Estonia, Spatial Data Infrastructure, automatic data update mechanisms.

Algorithms for Formation of Metadata Mathematical Retro Collections Based on Analysis of Structural Features of Documents

Polina Olegovna Gafurova, Alexander Michailovich Elizarov, Evgeny Konstantinovich Lipachev
238-271
Abstract:

The solutions of the main problems associated with the formation of digital mathematical collections from documents published in the pre-digital period are presented – such collections are designated in the work as retro collections. Algorithms for creating a meta description of retro collections based on the analysis of the structure of mathematical documents and the use of software tools for extracting metadata are given. The description of retro-collections formed using the developed algorithms and included in the metadata factory of the digital mathematical library Lobachevskii-DML is given. The schemes for the formation of metadata and methods for normalizing the extracted metadata in accordance with the schemes and requirements of the integrating mathematical libraries are indicated.

Keywords: Lobachevskii-DML, metadata factory, metadata management services, archive collections.

Information about Russian Research Organizations in Multilingual Data Sources

Zinaida Vladimirovna Apanovich
756-769
Abstract:

International and Russian-language data sources that provide information about Russian research-related organizations are considered. It is demonstrated that Russian-language data sources contain more information about Russian research-related organizations than most international data sources, but this information remains unavailable for English-language data sources. Experiments on comparison and integration of information about Russian research organizations in international and Russian data sources are outlined. Data sources such as GRID, Russian and English chapters of Wikipedia, Wikidata and eLIBRARY.ru are considered. The work is an intermediate step towards the creation of an open and extensible knowledge graph.

Keywords: multi-lingual knowledge graphs, identity resolution, research-related organizations, correctness.

Formalization of Processes for Forming User Collections in the Digital Space of Scientific Knowledge

Nikolay Evgenvich Kalenov, Irina Nikolaevna Sobolevskaya, Aleksandr Nikolaevich Sotnikov
433-450
Abstract: The task of forming a digital space of scientific knowledge (DSSK) is analyzed in the paper. The difference of this concept from the general concept of the information space is considered. DSSK is presented as a set containing objects verified by the world scientific community. The form of a structured representation of the digital knowledge space is a semantic network, the basic organization principle of which is based on the classification system of objects and the subsequent construction of their hierarchy, in particular, according to the principle of inheritance. The classification of the objects that make up the content of the DSSK is introduced. A model of the central data collection system is proposed as a collection of disjoint sets containing digital images of real objects and their characteristics, which ensure the selection and visualization of objects in accordance with multi-aspect user requests. The concept of a user collection is defined, and a hierarchical classification of types of user collections is proposed. The use of the concepts of set theory in the construction of DSSK allows you to break down information into levels of detail and formalize the algorithms for processing user queries, which is illustrated by specific examples.
Keywords: recursive link, knowledge cyberdomain, digital library, detail levels, data entries hierarchy.

Improving the Quality of Metadata Scientific Publications with Crossref Reports

Alexey Viktorovich Ermakov
1117-1136
Abstract:

Issues related to improving the quality of metadata of scientific publications placed in the Crossref bibliographic database are considered. All information contained in metadata obtained from publishers of scientific publications, Crossref analyzes and displays in various reports. The reports give publishers an idea of the completeness and correctness of the bibliographic data provided. The quality of metadata directly or indirectly affects the number of views and links to a publication, respectively, on the ratings of scientific publications, authors and organizations.

Keywords: metadata of publications, Crossref reports, citations, ratings of scientific publications.

Of Neural Network Model Robustness Through Generating Invariant to Attributes Embeddings

Marat Rushanovich Gazizov, Karen Albertovich Grigorian
1142-1154
Abstract:

Model robustness to minor deviations in the distribution of input data is an important criterion in many tasks. Neural networks show high accuracy on training samples, but the quality on test samples can be dropped dramatically due to different data distributions, a situation that is exacerbated at the subgroup level within each category. In this article we show how the robustness of the model at the subgroup level can be significantly improved with the help of the domain adaptation approach to image embeddings. We have found that application of a competitive approach to embeddings limitation gives a significant increase of accuracy metrics in a complex subgroup in comparison with the previous models. The method was tested on two independent datasets, the accuracy in a complex subgroup on the Waterbirds dataset is 90.3 {y : waterbirds;a : landbackground}, on the CelebA dataset is 92.22 {y : blondhair;a : male}.

Keywords: robust classification, image classification, generative adversarial networks, domain adaptation.
1 - 25 of 227 items 1 2 3 4 5 6 7 8 9 10 > >> 
Information
  • For Readers
  • For Authors
  • For Librarians
Make a Submission
Current Issue
  • Atom logo
  • RSS2 logo
  • RSS1 logo

Russian Digital Libraries Journal

ISSN 1562-5419

Information

  • About the Journal
  • Aims and Scopes
  • Themes
  • Author Guidelines
  • Submissions
  • Privacy Statement
  • Contact
  • eLIBRARY.RU
  • dblp computer science bibliography

Send a manuscript

Authors need to register with the journal prior to submitting or, if already registered, can simply log in and begin the five-step process.

Make a Submission
About this Publishing System

© 2015-2025 Kazan Federal University; Institute of the Information Society