Published: 20.04.2026
Full Issue
Part 1. Special issue Scientific Services & Internet 2025", II
Russian-English Dataset and Entity Alignment in Knowledge Graphs with Unmatchable Entities
In recent years, interest in knowledge graphs (KGs) has increased exponentially in both the scientific and industrial communities. Integration of various KGs is a pressing problem and is used, for example, to develop complex digital twins of industrial systems. Knowledge graph integration is also necessary when combining KGs extracted from natural language texts using large language models. One component of solving the KG integration problem is entity alignment (EA), which attempts to identify entities in different KGs that describe the same real-world object. In reality, many entities in real KGs have no equivalents in other KGs. In particular, each knowledge graph fragment extracted from a single publication may have its own structure of entity names and identifiers, which significantly complicates the task of identifying entities. This paper describes experiments on entity alignment in the presence of unmatchable entities using a Russian-English dataset as an example.
Common Digital Space of Scientific Knowlege Content Fragments Modeling
The article reflects the new results of research related to the formation of the Common Digital Space of Scientific Knowledge (CDSSK). This work has been carried out since 2019 in a number of academic organizations, including the Interdepartmental Supercomputer Center of the Russian Academy of Sciences (now the Department of Supercomputer Systems and Parallel Computing at the National Research Center "Kurchatov Institute"). As part of these studies, the structure of the CDSSK ontology, a language for its description, and a number of unified software tools have been developed to ensure the formation of the ontology of individual subspaces and the input of various types and kinds of object attributes and named relationships into the CDSSK. Currently, the formation of the CDSSK content is being modeled using the example of a universal and a number of thematic subspaces. The results of this modeling are presented below. The attributes and relationships of the "Administrative Units" class objects belonging to the "Geography" subspace, the "Organizations and Their Subdivisions" class, and the "Classification Systems" class belonging to the universal subspace are presented. The ability to navigate through the loaded real resources is demonstrated.
Keldysh Institute of Applied Mathematics' Preprints: Conversion from MS Word To HTML
In recent years, the presentation of full-text scientific articles in HTML has become widespread. This format offers several advantages for online publication compared to the traditional PDF format, owing to its more advanced tools for structuring material, embedding multimedia content, and implementing various interactive and dynamic features. Therefore, the task of converting manuscripts from the traditionally used MS Word and LaTeX formats into a high-quality HTML version capable of realizing the advantages of this format has become relevant. This paper presents the results of applying the approach for converting scientific articles from MS Word to HTML, proposed in previous studies, to Keldysh Institute of Applied Mathematics’ preprints. The interactive capabilities of the resulting HTML versions are described.
Virtual Exhibition “Science in the USSR During the Great Patriotic War” as an Element of the Common Digital Space of Scientific Knowledge
The article is dedicated to the analysis of the virtual exhibition "Science in the USSR during the Great Patriotic War" as an example of integrating digital resources within the electronic library "Scientific Heritage of Russia." The principles of collection formation are discussed, including the distribution of materials by languages, years of publication, and scientific disciplines, as well as technological solutions to facilitate navigation and search. Using the example of an interdisciplinary exhibition, the article demonstrates how digital technologies aid in systematizing and presenting scientific materials.
An Algorithm for Finding an Exact Solution to the Problem of Multiple Travelling Salesmen
This article considers the problem of several traveling salesmen. The task is to find a set of a predetermined number of disjoint cycles on a graph with weighted arcs, in which the weight (the sum of the weights of the arcs) of the largest cycle is minimal. An accurate algorithm for solving the problem based on the method of branches and boundaries has been developed. The constructed algorithm, as well as the well-known Balas' and Christofides' algorithm for solving the traveling salesman problem, uses the Hungarian algorithm for solving the assignment problem. Numerical experiments with large-dimensional random graphs have been carried out.
Errors of Artificial Intelligence in Solving Combinatorial Problems
Several combinatorial exercises have been considered, which artificial intelligence solves with errors. The representatives of artificial intelligence examined are ChatGPT and DeepSeek systems. Questions (prompts) to these systems are provided, and the obtained answers are analyzed. Hypotheses are proposed regarding the reasons for the errors made by artificial intelligence when solving the tasks under consideration. It is suggested that similar errors may occur when using artificial intelligence for software development and other applications. Topics for further research are proposed, which may be of interest for determining the conditions for the continued use of artificial intelligence.
Artificial Intelligence in Several Fragments
This paper is a mosaic of vivid fragments describing the industrial aspects of artificial intelligence (AI). These are sketches of the overall picture, which will likely never be completed, as each day brings information about new achievements, ideas, and threats. Discussions cover issues of civilian AI in short-term workstations, the development of algorithms for intelligent games, the threats and dangers posed by AI, AI ethics, and standards and international norms for artificial intelligence. Each fragment is a review of the latest (mid-January 2026) Russian and international sources, including quotes, translations, screenshots, and links to original documents.
This text remains an immense "fragment" on the benefits of AI applications, which was presented with the greatest speed. Perhaps this will be the beginning of a separate, never-ending study.
Separation of Heat Flow Processes in the North Atlantic into Various Components and Their Analysis
The heat flux distribution in the North Atlantic calculated using a stochastic difference equation scheme, namely, a first-order autoregressive scheme with random coefficients, is studied. The ERA5 database, containing geophysical data for 40 years, from 1979 to 2018, is used. The coefficients for the autoregressive series were previously determined based on these data, and it is shown that the conditions on the coefficients ensure the existence and uniqueness of a solution to this difference equation. The method for calculating distributions is based on successive integration using an autoregressive scheme. Computational experiments are conducted and analyzed. Moreover, it is shown that the theoretically calculated distributions are in good agreement with their empirical counterparts. Further, after the division of the original time series into a distinguished mean (trend) and a residual, the latter is analyzed as a stationary random process. Selected correlation functions were calculated and it is shown that they are well approximated by known analytical expressions. Those approximations allow explicitly filtering and prediction of the process under study. Numerical calculations were performed on the Lomonosov-2 supercomputer at Moscow State University.
The Place of Lisp in Teaching Functional Programming
This article examines the main problems of teaching functional programming to students already familiar with the imperative paradigm. The learner model and the main problems that arise when teaching functional programming in this case (mutable variables, loops, sequential calculations) are described. A detailed example of the transition from an imperative to a functional paradigm is given. The return of a functional value is examined in detail using examples of numerical differentiation and interpolation. An implementation of lazy evaluation based on anonymous functions is discussed. It is shown that the multi-paradigm Lisp language is a convenient introduction to the functional paradigm.
Using Open Archives of Scaled Vertical Radiosonding Ionograms as Labeled Data for Training Machine Learning Models
The idea of using the available large arrays of ionogram processing results from vertical radiosonde of the ionosphere as training datasets for building predictive models using machine learning methods is put forward. The most common formats for saving the results of ionogram processing are considered, as well as some Internet resources with archives of freely available files of these formats. These datasets are used by us to build predictive models, including time series of critical frequencies of ionospheric layers. It is also possible to use some datasets of ionogram processing results to train models designed for automatic ionogram processing.
Part 2. Original articles
Analysis of the Effectiveness of Subword Tokenizers in a Low-Resource Linguistic Environment: Implementation Experience for the Tajik Language
This paper examines modern approaches to subword tokenization of texts as applied to the low-resource Tajik language, which is characterized by a complex morphological structure and a high degree of word-form variability. In the course of the study, a large-scale heterogeneous corpus was compiled and preprocessed, comprising 99 books and 134,497 textual articles of various genres and topics, with a total volume exceeding 33 million tokens. The corpus was cleaned of noise, normalized, and used as a basis for training and subsequent testing of subword models.
Based on this corpus, five tokenization models implementing the BPE, WordPiece, and Unigram algorithms were trained and analyzed using the Hugging Face Tokenizers and SentencePiece libraries. Comparative evaluation was conducted using a set of key metrics, including the proportion of out-of-vocabulary (OOV) words, the degree of text representation compression, tokenization speed, as well as characteristics of n-gram distribution, which make it possible to assess the ability of the models to capture the morphological and structural organization of the language. The experimental results made it possible to identify the strengths and weaknesses of different approaches to subword segmentation and to determine the most effective tokenization strategies under conditions of the morphological complexity of the Tajik language. The findings obtained can be used in the development of language models and applied NLP tools for Tajik and other low-resource languages, contributing to the expansion of their presence in the digital environment.
Scientific Publications and the Embedding Space of Knowledge
The article examines current challenges in scientometrics arising from the surge in publication activity and the widespread adoption of generative artificial intelligence. The existing scientometric toolkit for analyzing research activity is reviewed, categorized into quantitative metrics and science mapping methods (citation network analysis, academic genealogy, semantic analysis, etc.). An attempt is made to overcome the limitations of traditional citation analysis, such as “semantic blindness” and vulnerability to manipulation. As a potential solution, a conceptual model is proposed where the unit of analysis shifts from the publication as a whole to an individual “key statement”. This approach involves recording not only the statement’s content but also its type, area of relevance, and its logical relationship with other claims (confirmation, refutation, clarification, generalization, etc.). Within this framework, principles for calculating modified scientometric metrics are introduced.
The proposed model was tested on a corpus of 728 articles from the Russian journal Informatics and Education (2016–2025). An analysis conducted using large language models revealed that retrospective extraction of statements faces significant hurdles due to established cultures of scientific communication. Consequently, the study highlights the advantages of having authors formulate key statements themselves as a distinct type of metadata. In conclusion, the paper outlines development paths for the concept of an “embedding space of knowledge,” which could eventually complement existing approaches to analyzing the evolution of scientific ideas and theories.
Artificial Intelligence Methods for Solving an Integral Equation with a Fractional Grunvald–Letnikov Integral
A computational scheme for the approximate solution of an integral equation with the Grünwald–Letnikov fractional integral has been developed, based on the least squares method. A distinctive feature of this scheme is the use of a neural network to compute the coefficients for the least squares method. The relevance of the study is обусловлена by the fact that, at present, artificial intelligence is increasingly being applied to solve many practical problems related to various physical processes. An estimate of the convergence of approximate solutions to the exact solution has been obtained. Possible directions for the further application of artificial intelligence in solving physical problems are also considered.
Modeling and Calculation Method of Reinforced Loaded Rod Structures
The paper presents mathematical models developed by the authors and a variational method for calculating spatial rod structures reinforced in a loaded state. The proposed models and calculation method offer broader capabilities compared to existing ones. Their application enables calculations for rod systems reinforced through methods such as increasing the cross-sections of rods, modifying the design scheme, and altering the deformed state. Complex rod structures with various cross-sectional shapes are examined. The calculations are based on the hypotheses of Timoshenko's beam theory. For thin-walled rods, the principles of thin-walled rod theory are additionally applied, taking shear deformations into account. It is assumed that the material of the rod element follows the diagram of a linearly hardening material. The calculation of the stress-strain state (SSS) of the reinforced structure is performed in stages:
1) at the initial stage, displacements and stresses in the structural elements under the action of initial loads are calculated;
2) at the second stage, the values of the mounting forces and stresses arising when reinforcing elements are attached to the main structural elements are determined;
3) at the final stage, the reinforced structure is analyzed under the effects of additional loads applied after reinforcement.
Based on the proposed models and calculation method, examples of calculation for operational structures reinforced by various methods are provided.
An Ontological Model for Integrating Cognitive and Sociological Data for Personnel Assessment
In the context of digital transformation of organizations and the growing volume of data, there is a demand for more transparent and explainable approaches to employee evaluation. The purpose of the study is to design and validate an ontological model (OWL 2/SHACL) that integrates employees’ cognitive indicators and sociological characteristics into a unified knowledge space to support HR processes. The scientific novelty of the work lies in the development of a unified semantic model linking data from cognitive tests, questionnaires, work context, and performance indicators; in the formulation of competency questions (CQ) that trigger reasoning mechanisms within the knowledge graph; and in the creation of patterns for predicting competency gaps, identifying the risk of overload/burnout, while ensuring ethics and non-discrimination control. The proposed approach is based on ontology engineering methodologies – METHONTOLOGY and NeOn, semantic web concepts, and psychometric methods.