Creating Pseudowords Generator and Classifier of Their Similarity with Words from Russian Dictionary using Machine Learning
Main Article Content
Abstract
In this article, a pseudoword is defined as a unit of speech or text that appears to be a real word in Russian but actually has no meaning. A real or natural word is a unit of speech or text that has an interpretation and is presented in a dictionary. The paper presents two models for working with the Russian language: a generator that creates pseudowords that resemble real words, and a classifier that evaluates the degree of similarity between the entered sequence of characters and real words. The classifier is used to evaluate the results of the generator. Both models are based on recurrent neural networks with long short-term memory layers and are trained on a dataset of Russian nouns. As a result of the research, a file was created containing a list of pseudowords generated by the generator model. These words were then evaluated by the classifier to filter out those that were not similar enough to real words. The generated pseudowords have potential applications in tasks such as name and branding creation, layout design, art, crafting creative works, and linguistic studies for exploring language structure and words.
Article Details
References
2. Shim K. MapReduce algorithms for Big Data Analysis // Proceedings of the VLDB Endowment. 2012. V. 5. No. 12. P. 2016–2017.
3. Строев В.В., Тихонов А.И. Применение технологий Data Mining для поиска соответствий закономерностей развития в больших массивах веб-данных на основе инструментов анализа Big Data // E-Management. 2022. Т. 5. N 4. С. 4–11.
4. Kim J., Shin S., Bae K., Oh S. Can AI be a content creator? Effects of content creators and information delivery methods on the psychology of content consumers // Telematics and Informatics. 2020. V. 55. P. 101452.
5. Лалетина А.О. Языковая норма в эпоху глобализации // Ученые записки Казанского университета. Серия Гуманитарные науки. 2011. Т. 153. № 6. С. 219–226.
6. Москалёва М.В. Неологизмы и проблема их изучения в современном русском языке // Известия РГПУ им. А. И. Герцена. 2008. № 80. С. 246–250.
7. Дмитриева Д.Д. Изучение словообразования на занятиях по русскому языку как иностранному // Балтийский гуманитарный журнал. 2020. Т. 9. № 1(30). С. 47–49.
8. Shipley D., Hooky G.J., Wallace S. The brand name Development Process // International Journal of Advertising. 1988. V. 7. No. 3. P. 253–266.
9. Mazzola G., Carapezza M., Chella A., Mantoan D. Artificial Intelligence in Art Generation: An Open Issue // Image Analysis and Processing – ICIAP 2023 Workshops. 2023. V. 14366. P. 258–269.
10. Jarmulowicz L., Taran V.L. Lexical morphology // Topics in Language Disorders. 2013. V. 33. No. 1. P. 57–72.
11. Iqbal T., Qureshi S. The survey: Text generation models in deep learning // Journal of King Saud University - Computer and Information Sciences. 2022. V. 34. No. 6. P. 2515–2528.
12. Yu Y., Si X., Hu C., Zhang J. A review of Recurrent Neural Networks: LSTM cells and network architectures // Neural Computation. 2019. Т. 31. No. 7. P. 1235–1270.
13. Ketkar N. Introduction to Keras // Deep Learning with Python. Berkeley, CA: Apress, 2017. P. 97–111.
14. Helms M. Badestrand/Russian-Dictionary: Dataset of nouns, verbs, adjectives and others from my Russian dictionary website OpenRussian.org. [Электронный ресурс]. URL: https://github.com/Badestrand/russian-dictionary (дата обращения: 17.10.2023).
15. Rodríguez P., Bautista M.A., Gonzàlez J., Escalera S. Beyond one-hot encoding: Lower dimensional target embedding // Image and Vision Computing. 2018. V. 75. P. 21–31.
16. Mao A., Mohri M., Zhong Y. Cross-entropy loss functions: Theoretical analysis and applications // Proceedings of the 40th International Conference on Machine Learning. 2023. V. 202. P. 23803–23828.
17. Manaswi N.K. Understanding and Working with Keras // Deep Learning with Applications Using Python. Berkeley, CA: Apress, 2018. P. 31–43.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.
All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.
Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.
By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.
RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.
Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.
The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.
We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.