• Main Navigation
  • Main Content
  • Sidebar

Russian Digital Libraries Journal

  • Home
  • About
    • About the Journal
    • Aims and Scopes
    • Themes
    • Editor-in-Chief
    • Editorial Team
    • Submissions
    • Open Access Statement
    • Privacy Statement
    • Contact
  • Current
  • Archives
  • Register
  • Login
  • Search
Published since 1998
ISSN 1562-5419
16+
Language
  • Русский
  • English

Search

Advanced filters

Search Results

Semantic Recommendation Service for Assigning UDC Code to Mathematical Articles

Olga Avenirovna Nevzorova, Damir Albertovich Almukhametov
203-224
Abstract:

Classification of documents with the assignment of classifier codes is a traditional way of systematizing and searching for documents on a specific topic. The Universal Decimal Classification (UDC) underlies the systematization of knowledge presented in libraries, databases and other information repositories. In Russia, UDC is an obligatory attribute of all book production and information on natural and technical sciences. The choice of classification codes is associated with the analysis of the structure of the classifier tree and is traditionally decided by the author of a scientific article. This article proposes a solution for automating the assigning the UDC classification code for a mathematical article based on a special resource – the OntoMathPRO ontology for professional mathematics, developed at Kazan Federal University. An approach to solving the problem is to create "code maps" for each classifying code in the UDC tree in the field of mathematics. Under the "code map" is meant a weighted set of all extracted, with the help of OntoMathPRO ontology, mathematical named entities from the collection of articles with a given UDC code. The creation of "code maps" is based on the hypothesis that the choice of the UDC code is determined by a certain set of classifying features that can be represented by classes from the OntoMathPRO ontology. The proposed hypothesis was tested and confirmed in the paper. The hypothesis was tested on a collection of mathematical articles An approach to solving the problem is to create "code maps" for each classifying code in the UDC tree in the field of mathematics. Under the "code map" is meant a weighted set of all extracted, with the help of OntoMathPRO ontology, mathematical named entities from the collection of articles with a given UDC code. The creation of "code maps" is based on the hypothesis that the choice of the UDC code is determined by a certain set of classifying features that can be represented by classes from the OntoMathPRO ontology. The proposed hypothesis was tested and confirmed in the paper.  The hypothesis was tested on a collection of mathematical articles published during 1999-2009 in the "Izvestiya VUZov. Mathematics" journal. 

Keywords: the Universal Decimal Classification, code map,, code map, the OntoMathPRO ontology, mathematical article.

An Approach to Creating an HTML Version of a Scientific Article from a Manuscript in MS Word Format for a Low-Budget Publisher

Rimma Yuryevna Skornyakova
1064-1089
Abstract:

The most common approach to creating an HTML version of a journal article among scientific publishers is to first create an XML version of the article in accordance with the NISO Journal Article Tag Suite (JATS) standard, followed by automatic conversion to HTML and PDF formats. However, obtaining an XML version from a manuscript in the .docx format of the MS Word word processor, often used by authors, when it contains a large number of complex formulas and tables is a difficult task. The existing software either does not cope with it in full or is expensive and inaccessible to small publishers with a limited budget. This paper proposes an approach to creating an HTML version of a journal article from a manuscript in .docx format containing formulas in MathType format, which does not require significant financial and time costs from the publisher. It also describes a currently implemented prototype of an underlied this approach converter of scientific articles from .docx format to HTML and JATS XML formats, which is applicable for KIAM preprints.

Keywords: HTML version of a scientific article, XML version of a scientific article, JATS XML, conversion of scientific articles from .docx format to html.

Methods and Tools Used for Preparation Scientific Articles Publications in HTML Format

Rimma Yuryevna Skornyakova
252-302
Abstract:

Along with the traditional form of electronic presentation of full texts scientific articles – the PDF format, the HTML format has become increasingly widespread in recent years. It has a number of advantages for online publications due to the available means for better content structuring, adding multimedia and implementing of various interactive and dynamic features. In this regard, the task of getting an HTML version of a scientific article from the original format sent by the author becomes highly topical. The article discusses various approaches to preparing HTML versions of full texts scientific articles and describes the software used in this process. The main attention is paid to the tools used for source materials in the Word format.


The paper also outlines the basics of the JATS XML standard, which is widely used in the preparation of online publications of journal articles.

Keywords: HTML version of a scientific article, XML version of a scientific article, standard for the exchange of scientific articles, JATS, conversion of scientific article formats.

Preprint as the Material for an Overlay Journal

Tatyana Alekseevna Polilova
387-407
Abstract:

The Open access movement has a long history. In 2002 the Budapest Open access initiative was first announced. However, the problem of Open access has not yet been fully and definitively resolved. In 2018 The European Union has adopted Plan S, which calls for making Open access a reality by 2020. Plan S emphasizes the importance of self-archiving of articles and the role of Preprint’s archives (servers) for scientific results placement. It is noted that Preprint archives have a great potential for editorial and publishing innovations. Scientific journals with limited reader access that operate on a commercial basis do not give up their positions. But even here we see some progress. Journals have become less rigid in their policy towards preprints and post-prints.


More and more foreign scientists are becoming adherents of the "Fair open access" movement, which offers a new organizational solution. The journal must have a scientific organization or non-profit Foundation as a founder, that hires a group of executors to provide editorial and publishing services. Editors and publishers should not have their own commercial interests. The scientific journal should be funded from the general contribution of organizations.


The article considers a modern type of online scientific journal — the overlay journal. The cost of an issue of the overlay journal is so low that the journal can easily implements the "free for the author, free for the reader" scheme. The overlay journal is based on the public servers of preprints. The online overlay journal reviews the article received from the archive. If the article is accepted for publication, the article metadata is published on the journal website, and the full text of corrected article is re-archived. This way of working does not overload the archive functionality, but it allows to reduce the financial burden on the overlay journal.

Keywords: scientific journal, Fair open access, Open archive, server of preprints, overlay journal.

Analysis of the Distribution of Key Terms in Scientific Articles

Svetlana Aleksandrovna Vlasova, Nikolay Evgenievich Kalenov, Irina Nikolaevna Sobolevskaya
35-51
Abstract:

One of the Common Digital Space of Scientific Knowledge (CDSSK) main components are the subject ontologies of individual thematic subspaces, which include the basic concepts related to this scientific area. The constructing subject ontologies task at the initial phase requires the array of key terms formation in a given scientific are with the subsequent establishment of links between them. A similar task is in the encyclopedias formation in terms of the articles (slots) list generating that determines their content. One of the sources for the formation of the key terms array can be the metadata of articles published in the leading scientific journals. Namely, the author's key terms ("keywords" in the terminology of the journals editors) quoted by the article. To make a conclusion about the possibility of using this approach to the subject ontologies formation, it is necessary to conduct the author's key terms array preanalysis, both in terms of real correspondence to the main areas of research in this science branch and in terms of the distribution of the certain terms occurrence frequency. This article presents the results of the occurrence frequency analysis of the author's key terms in Russian and English, carried out on the software processing basis of several thousand articles from leading Russian journals in mathematics, computer science and physics, reflected in the MathNet database. An assessment was made of the distribution of key terms correspondence (as phrases) and individual words to the Bradford's law, and the key terms cores within the thematic direction were identified.

Keywords: digital space of scientific knowledge, subject ontologies, encyclopedia articles, key terms, article metadata, frequency analysis.

The Rating of the Journal in the Bibliographic Database

Mikhail Mikhailovich Gorbunov-Posadov, Tatyana Alekseevna Polilova
1060-1089
Abstract:

The tool for building ratings of scientific journals is one of the popular services of bibliographic databases. The task of building a rating is usually divided into two main subtasks: determining the reference group of journals and calculating the rating indicator for journals of this reference group. Practice shows that for the correct comparison of journals, a necessary condition is to limit the reference group to exclusively journals of a certain subject. In the case of methodological errors made at the stage of selecting a reference group, the values of the journal index in the rating may differ greatly from the expected ones.


For example, in the ranking of journals in the Russian Science Citation Index (RSCI) according to the two-year impact factor in the thematic area “Mathematics”, classical fundamental mathematical journals, contrary to expectations, do not reach the first positions of the rating. The first positions were taken by journals for which mathematics is not the dominant profile discipline. Analysis of statistical data on the subject of published articles and citations in journals that occupy leading positions in the RSCI rating shows that the multidisciplinary nature of these journals significantly influenced the rating indicators.


The noted misunderstanding leads to the idea that in this case, not all the articles of the journal should have been involved in the calculation of the rating, but only those related to this thematic area. At the same time, the existing scheme of thematic classification of directions also raises questions. The "bottom-up" classification, which is gaining popularity and works on a representative array of articles, seems to be more promising. Here thematic clusters are isolated on the basis of the concept of proximity of articles, interpreted as the proximity of their bibliographic links. And further, the thematic affiliation of the article is not assigned by the volitional decision of the author or the editorial board, but is strictly formally calculated on the basis of its bibliographic list.

Keywords: scientific publication, citation, rating of journals, thematic classification, impact factor, multidisciplinary, bibliographic reference, co-citation, bottom-up classification, thematic clustering, Citation Topics.

Information System for Registering the Result of Scientific Institution Employees’ Intellectual Activity

Svetlana Aleksandrovna Vlasova, Nikolay Evgenevich Kalenov
218-237
Abstract:

The article describes a typical object-oriented WEB-system designed for storing and providing various reference and statistical data on the scientific works of employees of an institution (group of institutions), developed by specialists of the JSCC RAS. The system contains information about publications of employees and reports made by them at scientific conferences, symposiums, and seminars. The system is focused on working with objects belonged to classes connected between each other, such as "author", "organization", "publication", "report", "event". The metadata profile of objects of each class includes attributes that are necessary to get detailed information about both an individual object of this class and a group of objects associated with the specified attribute values of objects of other classes. For example, you have to get a list of articles by employees of a given organization published articles in a given journal for a given period of time. A distinctive feature of the system is the introduced concept of "equivalent" objects. Such objects are "persons" corresponding to the same author with different spellings of the last name in the bibliographic descriptions of publications; organizations with different versions of names; articles which are published without changes in different languages. This article describes in detail the features of the system, its user interface, and provides examples of performing specific queries.

Keywords: databases, research results accounting, WEB-based system, network technologies, publication activity analysis, software.

Developing Technological Cycle of Search System that Agregates Citations by Books

Roman Valerievich Mosolov
246-256
Abstract:

In this article, we have described the technological cycle to develop the search system by 14 philosophical books by L.A. Seklitova, and L.L. Strelnikova. The cycle contained 6 steps of work. The ideas from the article may be useful to project, and develop a software, aggregating citations from books series, monographs, scientific periodicals, or scientific articles. For example, this experience may be useful for creating customized links on secondary sources that needs at a stage of writing scientific articles and design of presentations in Pedagogy. The search system is the result of 1 year work by the article author, and the group of around 30 volunteers. The system is represented a service, integrating in the web application. The technological stack contains Jade, CSS, JS, Node.js, Express.js, ESLint, Jest.

Keywords: search system, searching system, search by books, search by book, search by citations, citations aggregator, aggregator by citations, books aggregate, citations data aggregators, develop search engine.

Algorithm for linking translated articles using authorship statistics

Александр Сергеевич Козицын, Сергей Александрович Афонин, Андрей Александрович Зензинов
494-505
Abstract: During the last decades scientometric techniques have been used for research activity stimulation. Number of published articles and number of their citation counts are among the most important scientometric parameters. In an automated environment, when the publications metadata is gathered from various sources, correct linking of original papers with their translations into different languages is extremely important. In the paper we show that the known text similarity measures are inefficient in the context of article linkage problem. We propose a method for semi-automatic article linkage using statistical data on authors publication activities only. This approach may be used for linking articles without training for the language of translation. The method was evaluated on real-world collection of publications metadata of ISTINA information system.
Keywords: bibliographic data, graph analysis, translation, article, statistics, scientometrics, citation, automated systems.

Technology for Filling Subject Ontologies of the Scientific Knowledge Space

Nikolay Evgenievich Kalenov
101-115
Abstract:

Subject ontology in the context of this article is understood as a set of key concepts related to a certain field of science, with their semantic connections, supplemented by indexes of various classification systems describing this scientific field. Subject ontologies are a necessary component of each subspace that is part of the Unified digital space of scientific knowledge (DSSK). This article presents the results of research related to the construction of subject ontologies based on the created automated system for supporting terminological dictionaries and suggests a methodology for identifying new key terms in a particular field of science. The proposed methodology is based on the use of existing classification systems in conjunction with citation databases, such as Web of Science and Scopus for English–language publications and the Russian citation index for Russian-language publications. The methodology involves dividing the scientific field into a number of sections in accordance with the selected classification system, extracting from the CSB the core of articles related to each section, and from the articles - new author's keywords, which should constitute, in combination with the corresponding sections of classification systems, the basis of the subject ontologies of this scientific field.

Keywords: scientific digital space, subject ontology, citation databases, keywords, thesaurus, classification systems.

Extraction Of Wikidata Knowledge For The Metadata Formation For Documents of Digital Mathematical Collections

Polina Olegovna Gafurova, Alexander Michailovich Elizarov, Evgeny Konstantinovich Lipachev
1023-1059
Abstract:

Methods for creating digital mathematical collections that include unstructured sets of documents are presented. These sets contain materials from scientific conferences, as well as articles from the archives of mathematical journals of the "pre-digital" period.


Using the software tools of the metadata factory of the digital mathematical library Lobachevskii DML, a mandatory set of metadata for digital collection documents was formed. To refine and replenish the metadata sets, knowledge extraction methods from Wikidata were used.


To search Wikidata for information about digital collection documents and their authors, a system of SPARQL queries has been developed. A set of Wikidata entities is defined, which determine the features of the search, as well as the subsequent filtering of the results.


Methods for clarifying and supplementing the bibliographic references given in the articles are proposed. When forming the metadata of documents of retrocollections, a search was made in Wikidata for information about the years of life of the authors of articles, as well as URLs of web pages with information about articles and their authors. The results of the formation of several new digital collections of the Lobachevskii-DML digital library are presented.

Keywords: Wikidata, metadata, metadata factory, digital mathematical collection, retrodigitized mathematical collection, Digital Mathematical Libraries, Lobachevskii-DML.

Automatic Replenishment of Metadata of Digital Publications using Semantic Services of the Internet

Polina Olegovna Gafurova
164-186
Abstract:

The article describes approaches to replenishing metadata of documents in electronic collections of a digital mathematical library. An open resource of the semantic network is used as a replenishment. For this purpose, software tools have been developed to search for the necessary data and include it in a metadata set. A separate block of metadata in a scientific article is formed from the affiliation of the authors presented in the document. Typically, the ownership that occurs in a document does not contain sufficient data to generate a set of metadata. A method has been developed for providing author affiliation metadata, providing an open register of scientific organization identifiers (ROR), as well as means for making connections between ROR and other semantic chains. This method was applied to the collections of articles of the journal “Digital Libraries” for 2021–2022.


The article describes a method for connecting the Lobachevsky digital mathematical library-DML to new electronic collections, and describes a method for transforming metadata into a digital format available for downloading.

Keywords: ROR, Wikidata, digital libraries, affiliation metadata, Lobachevskii-DML.

Recommender System in the Process of Scientific Peer Review in Mathematical Journal

Alexander Mikhailovich Elizarov, Evgeny Konstantinovich Lipachev, Shamil Makhmutovich Khaydarov
708-732
Abstract: An approach is proposed for organizing expert evaluation of a scientific document submitted to a mathematical journal. Domain restriction is associated with the use of the Mathematical Sciences Classification System – MSC. A recommendation system is presented that allows you to create a list of possible experts for conducting scientific peer-reviewing on a mathematical article. The recommender system uses the MSC codes presented by the author of the article on the MSC2020 classifiers. If the codes MSC2000 or MSC2010 are indicated in the article, they are automatically converted to codes MSC2020. For each expert, the system supports a personal profile that contains a set of codes MSC2020, supplemented by numerical characteristics – weights calculated for each code in accordance with the system of accounting for competencies, preferences or refusals to participate in the review procedure. This set is automatically edited if the expert is included in the list of possible reviewers – the weights of several codes increase or decrease, as well as new codes are added. The recommendation system is implemented as an integrated tool (plug-in) of the Open Journal Systems (OJS) platform. The developed method has been tested in the information system of the Lobachevskii Journal of Mathematics (https://ljm.kpfu.ru).
Keywords: scientific journal information system, Open Journal Systems, peer review workflow, automated reviewers selection, Mathematics Subject Classification 2010, Lobachevskii Journal of Mathematics.

Some software instruments for the automated replenishment of the terminological dictionary of the subject oblast

Роман Анатольевич Румянцев, Ольга Авенировна Невзорова
91-122
Abstract:

The article describes the application OntoDictionary, which is designed to work with scientific mathematical articles and ontologies created in Protege. The application is able to create an ontology dictionary, split its elements into concepts, and process them in Boolean search. There is a functional for the selection of certain nominal groups from mathematical articles. The novelty lies in the creation and method of processing nominal groups containing formulas. Formulas are processed regardless of their type. The selection of candidates for terms has been constructed. Throughout the functional, a number of experiments have been performed with the ontology of mathematical knowledge of OntoMathPRO, which was also developed at the Kazan Federal University.

Keywords: mathematical knowledge, ontology, concept, search index, Noun Phrase, candidates in terms.

About the license agreement for the work-for-hire publication

Tatyana Alekseevna Polilova
119-141
Abstract: In accordance with the Civil code of the Russian Federation, a research paper is the result of intellectual activity, which is provided with state protection. The author of the research paper owns the right of authorship, the right to a name and other non-property rights. If the paper is created within the framework of the authors' implementation of their official duties, the exclusive right to the paper belongs to the employer.
With the consent of the employer, the author concludes a license agreement with the publisher for the publication of the paper on the terms proposed by the publisher. Signing of the license agreement does not entail the transfer of the exclusive right to the publisher. Even if the employer has instructed the author to enter into a copyright agreement with the publisher under an exclusive license, the employer reserves the right to use the work, including the right to publish the work on its website.
The author (copyright holder) always retains the right to create derivative works. Often imposed by the publisher terms of the license agreement, limiting the author's right to create works on the basis of previously published articles, have no legal force.
The publication by the author of derivative works containing fragments of the author's previous paper should not be considered as a violation of publishing ethics. The term "self-plagiarism" is incorrect.
The Civil code of the Russian Federation establishes a simple (non-exclusive) license that allows several publishers to publish an article without its processing. The publication of article in several editions — this is one of the legal ways of realization of the rights of the author (copyright holder) on a wide publication of a work.
Keywords: research paper, work-for-hire, exclusive right, license agreement, copyright agreement, exclusive license, simple license, derivative work, text recycling, redundant publication.

The Use of Thematic Analysis Methods in Scientometric Systems

Alexander Sergeevich Kozitsyn, Sergey Alexandrovich Afonin, Dmitry Alekseevich Shachnev
315-338
Abstract:

Modern scientometric systems and citation systems use various mechanisms of thematic search and thematic filtering of information. In most cases, a full-text approach is used for thematic analysis of articles and journals, which has a number of limitations. The use of algorithms based on graph analysis, both independently and in conjunction with full-text algorithms, eliminates these limitations and improves the completeness and accuracy of subject search. The algorithm developed by the authors and presented in this work uses the co-authorship graph to analyze the thematic proximity of journals. The algorithm is insensitive to the language of the journal and selects similar journals in different languages, which is difficult to implement for algorithms based on the analysis of full-text information. The algorithm was tested in the scientometric system IAS ISTINA. In the interface developed for these purposes, the user can select one journal that is close to him on the subject, and the system will automatically generate a selection of journals that may be of interest to the user both in terms of studying the materials available in them and in terms of publishing his own articles. In the future, the developed algorithm can be adapted to search for similar conferences, collections of publications and scientific projects. The presence of such a tool will increase the publication activity of young employees, increase the citation rate of articles and the citation rate between journals. The results of the algorithm for determining thematic proximity between journals, collections, conferences and scientific projects can also be used to build rules in models of differentiating access to data based on domain ontologies.

Keywords: thematic classification, bibliographic data, co-authorship graph, information systems.

Russian Scientific Publication — 2019

382-389
Abstract: The article presents the events that took place last year in the world of Russian scientific publications. There is a slow slide towards paid access of some academic journals turned in open access in 2018. The European Union has announced plan "S" for the mass transition of scientific journals to open access. New models of the scientific publication are introducing. Reporting on publications requested by the Ministry of education and science in 2019 does not take into account the size of the readership of the article. Neither the Ministry of education and science, nor the Higher Attestation Commission (HAC) does not encourage publication in the public domain. In Russian Science Citation Index began the fight against widespread fraudulent trade in references to the article, but the HAC is not interested in this activity. A proliferation of contradictory the term "self-plagiarism" has spread. This label is widely stigmatized authors and journals for repeated publications.
Keywords: open access, plan "S" administrative assessment of article, serial publications, online reader, h-index, Dissernet, self-plagiarism.

Development of the Information System for Registering the Result of Scientific Institution’ Employees Intellectual Activity

Svetlana Aleksandrovna Vlasova, Nikolay Evgenevich Kalenov
770-793
Abstract:

The article describes a Web-system developed by the authors that implements services related to the formation and provision of multifaceted information about the results of scientific activities (publications, copyright certificates and reports at scientific events) of employees of an organization or a group of organizations. The system is focused both on the end user interested in obtaining specific data, and on the administrative staff, who generates reporting materials for the parent organization. The information base of the system contains metadata on the following classes of objects: persons (authors), organizations and their subdivisions; publications at analytical, monographic and summary levels; copyright certificates; scientific events (conferences, symposia, seminars); reports. The system includes two modules – an administrative one (intended for entering and editing data) and a user one, which is a special search engine that searches for information, visualizes it, provides navigation among related resources and exports data. A distinctive feature of the system is the introduced concept of “equivalent” objects. Objects are considered equivalent if they are represented in the system by different metadata, but referring to the same physical entity. Such objects are “persons” corresponding to one author with different spellings of the surname in the bibliographic descriptions of publications; organizations with different variants of names; articles published unchanged in various languages. In accordance with modern requirements for reporting on publications, the system reflects the sources of research funding, as well as the affiliations indicated in the articles for each author.

Keywords: scientific works, scientific activity, automated system, database, management reports, network technologies.

Intelligent search of complex objects in Big Data

Александр Михайлович Гусенков
40-76
Abstract: This article considers approach to intelligent search of complex objects in different types of texts with structural markup which can be used for Big Data processing. We research two types of data entry: relational databases, which use their schemes as structural markup, and full-text scientific documents containing mathematical expressions (formulae). For such full-text documents we suggest additory automated markup to allow formula search. In both cases we use natural language texts, which are semistructured data, as data source for building ontology and conducting search at a later stage. For relational databases those are comments to table and table attribute names; for scientific documents (articles, monographs, etc.) it is a text content of marked up documents.
Keywords: Big Data, semantic search, semi-structured data, ontology, relational databases, science texts, mathematical expressions markup.

RSCI as a Mirror of Publication Activity of RAE Members

Yuri Evgenievich Polyak
543-562
Abstract: Based on information from open sources, a table was compiled reflecting the indicators of 128 full members of the Russian Academy of Education (RAE) in the Russian Science Citation Index (RSCI). The main results are given in a condensed form and compared with the results of a similar study performed several years earlier. The conclusions and features of the RSCI as an analytical tool are discussed.
Keywords: Russian Academy of Education, Russian Science Citation Index, publication activity.

Using Web-Quest Technology in Cybersecurity Training

Olga Troitskaya, Eva Vohtomina
195-201
Abstract: The need for schoolchildren to develop safe behavior skills in cyberspace is justified in the article. One way is to use web-quest technology. The article contains a brief description of this technology and an example of its use in teaching the basics of cybersecurity.
Keywords: web-quest, cybersecurity, cyberthreat, safe behavior.

Organization of Calculations and Work with Memory in the Educational Programming Language SYNHRO

Lidia Vasiljevna Gorodnyaya
566-599
Abstract:

The article is devoted to a number of decisions made in the project of the educational programming language Synchro, which is being developed at the Laboratory of Information Systems of the A.P. Ershov IIS SB RAS, designed to familiarize with the basic phenomena of the interaction of processes and control of calculations over shared memory. The focus is on the functional programming paradigm. The language is aimed at schoolchildren of primary and secondary grades, as well as junior students and non-professionals. During training, the experience of operating with toy robots moving on a checkered board is used. The article is of interest to everyone who is interested in the problems of modern computer science, programming and information technology, especially the problems of parallel computing on supercomputers and distributed systems, and in general the use of multiprocessor systems.

Keywords: educational programming languages, virtual machine, command system, functional programming, data recovery, memory release, multithreaded programs, parallel computing, shared memory, process interaction.

Creating a comparison method for relational tables

Azat Shavkatovich Yakupov, Daniil Andreevich Klinov
173-183
Abstract: The article is devoted to creating a quick method of comparing a huge amount of data tables in relational database management systems. Creating an effective method for comparing relational systems is really relevant today. The study of existing solutions was conducted. The algorithm in this article was created using the probabilistic data structure «Countable Bloom filter» and the Monte Carlo Method. The proposed solution is unique in its direction, as it uses the least amount of temporary resources. A probabilistic model of the created algorithm is constructed, this algorithm can be used for parallelization.
Keywords: multiset, comparison of relational tables, heterogeneous system, Countable Bloom filter, Monte Carlo method, replication, Oracle, PostgreSQL, Probabilistic data structure.

Application of the Douglas-Peucker Algorithm in Online Authentication of Remote Work Tools for Specialist Training in Higher Education Group of Scientific Specialties (UGSN) 10.00.00

Anton Grigorievich Uymin, Vladimir Sergeyevich Grekov
679-694
Abstract:

In today's world, digital technologies are penetrating all aspects of human activity, including education and labor. Since 2019, when, in response to global challenges, the world's educational systems have actively started to shift to distance learning, there has been an urgent need to develop and implement reliable identification and authentication technologies. These technologies are necessary to ensure the authenticity of work and protection from falsification of academic achievements, especially in the context of higher education in accordance with the group of specialties and directions (USGS) 10.00.00 - Information Security, where laboratory and practical work play a key role in the educational process.


The problem lies in the need to optimize the flow of incoming data, which, first, can affect the retraining of the neural network core of the recognition system, and second, impose excessive requirements on the network's bandwidth. To solve this problem, efficient preprocessing of gesture data is required to simplify their trajectories while preserving the key features of the gestures.


This article proposes the use of the Douglas–Peucker algorithm for preliminary processing of mouse gesture trajectory data. This algorithm significantly reduces the number of points in the trajectories, simplifying them while preserving the main shape of the gestures. The data with simplified trajectories are then used to train neural networks.


The experimental part of the work showed that the application of the Douglas–Peucker algorithm allows for a 60% reduction in the number of points in the trajectories, leading to an increase in gesture recognition accuracy from 70% to 82%. Such data simplification contributes to speeding up the neural networks' training process and improving their operational efficiency.


The study confirmed the effectiveness of using the Douglas–Peucker algorithm for preliminary data processing in mouse gesture recognition tasks. The article suggests directions for further research, including the optimization of the algorithm's parameters for different types of gestures and exploring the possibility of combining it with other machine learning methods. The obtained results can be applied to developing more intuitive and adaptive user interfaces.

Keywords: authentication, biometric identification, remote work, distance learning, Douglas–Peucker algorithm, data preprocessing, neural network, HID devices, mouse gesture trajectories, data optimization.

Development of a Method for User Segmentation using Clustering Algorithms and Advanced Analytics

Daniil Andreevic Klinov, Karen Albertovich Grigorian
137-147
Abstract:

The article is devoted to the creation of an effective solution for user segmentation. The article presents an analysis of existing user segmentation services, an analysis of approaches to user segmentation (ABCDx segmentation, demographic segmentation, segmentation based on a user journey map), an analysis of clustering algorithms (K-means, Mini-Batch K-means, DBSCAN, Agglomerative Clustering, Spectral Clustering). The study of these areas is aimed at creating a “flexible” segmentation solution that adapts to each user sample. Dispersion analysis (ANOVA test), analysis of clustering metrics is also used to assess the quality of user segmentation. With the help of these areas, an effective solution for user segmentation has been developed using advanced analytics and machine learning technology.

Keywords: Segmentation, clustering, analysis of variance, machine learning, advanced analytics, ANOVA test, product analytics.
1 - 25 of 242 items 1 2 3 4 5 6 7 8 9 10 > >> 
Information
  • For Readers
  • For Authors
  • For Librarians
Make a Submission
Current Issue
  • Atom logo
  • RSS2 logo
  • RSS1 logo

Russian Digital Libraries Journal

ISSN 1562-5419

Information

  • About the Journal
  • Aims and Scopes
  • Themes
  • Author Guidelines
  • Submissions
  • Privacy Statement
  • Contact
  • eLIBRARY.RU
  • dblp computer science bibliography

Send a manuscript

Authors need to register with the journal prior to submitting or, if already registered, can simply log in and begin the five-step process.

Make a Submission
About this Publishing System

© 2015-2026 Kazan Federal University; Institute of the Information Society