Инструмент для оперативной диагностики памяти нейросетевых архитектур языковых моделей

Large Language Models (LLMs) have evolved from simple n-gram systems to modern universal architectures; however, a key limitation remains the quadratic complexity of the self-attention mechanism with respect to input sequence length. This significantly increases memory consumption and computational costs, and with the emergence of tasks requiring extremely long contexts, creates the need for new architectural solutions. Since evaluating a proposed architecture typically requires long and expensive full-scale training, it is necessary to develop a tool that allows for a rapid preliminary assessment of a model’s internal memory capacity.

This paper presents a method for quantitative evaluation of the internal memory of neural network architectures based on synthetic tests that do not require large data corpora. Internal memory is defined as the amount of information a model can reproduce without direct access to its original inputs.

To validate the approach, a software framework was developed and tested on the GPT-2 and Mamba architectures. The experiments employed copy, inversion, and associative retrieval tasks. Comparison of prediction accuracy, error distribution, and computational cost enables a fast assessment of the efficiency and potential of various LLM architectures.

Keywords:

large language models, neural network architecture, internal memory, long-term information retention, sequence processing, functional memory measurement, architecture comparison.

How to Cite

Gavrikov, P. A., A. K. ugli Usmanov, D. Revayev, and S. N. Buzykanov. “A Tool for Rapid Diagnostics of Memory in Neural Network Architectures of Language Models”. Russian Digital Libraries Journal, vol. 28, no. 6, Dec. 2025, pp. 1346-67, doi:10.26907/1562-5419-2025-28-6-1346-1367.

References

1. Kaplan J., McCandlish S., Henighan T., et al. Scaling Laws for Neural Language Models // arXiv preprint arXiv:2001.08361. 2020. https://doi.org/10.48550/arXiv.2001.08361
2. Brown T., Mann B., Ryder N., et al. Language Models are Few‑Shot Learners // Advances in Neural Information Processing Systems. 2020. Vol. 33. P. 1877-1901. https://doi.org/10.5555/3495724.3495883
3. Beltagy I., Peters M. E., Cohan A. Longformer: The Long‑Document Transformer // arXiv preprint arXiv:2004.05150. 2020. https://doi.org/10.48550/arXiv.2004.05150
4. Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I. Language Models are Unsupervised Multitask Learners // OpenAI. 2019.
5. Common Crawl Foundation. Common Crawl dataset. https://commoncrawl.org
6. Gu A., Goel K., Ré C. Efficiently Modeling Long Sequences with Structured State Spaces // International Conference on Learning Representations (ICLR). 2022.
7. Gao L., Biderman S., Black S., et al. The Pile: An 800 GB Dataset of Diverse Text for Language Modeling // arXiv preprint arXiv:2101.00027. 2020. https://doi.org/10.48550/arXiv.2101.00027
8. Eldan R., Li Y. TinyStories: How Small Can Language Models Be and Still Speak Coherent English? // arXiv preprint arXiv:2305.07759. 2023. https://doi.org/10.48550/arXiv.2305.07759
9. Dao T. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness // Advances in Neural Information Processing Systems (NeurIPS). 2022. Vol. 35. P. 16344-16359. https://doi.org/10.48550/arXiv.2205.14135
10. Gu A., Goel K., Dao T., et al. Mamba: Linear-Time Sequence Modeling with Selective State Spaces // International Conference on Learning Representations (ICLR). 2024.
11. Kwon W., Lee S., Li S., Zaharia M., Zhang H., Stoica I., Sheng Y., Crichton W., Xie S., Gonzalez J. Efficient Memory Management for Large Language Model Inference with KV-Caching // arXiv preprint arXiv:2309.06180. 2023. https://doi.org/10.48550/arXiv.2309.06180
12. Vaswani A., Shazeer N., Parmar N., et al. Attention Is All You Need // Advances in Neural Information Processing Systems (NIPS). 2017. Vol. 30. P. 5998–6008. https://doi.org/10.5555/3295222.3295349
13. Tay Y., Bahri D., Metzler D., et al. Long Range Arena: A Benchmark for Efficient Transformers // arXiv preprint arXiv:2011.04006. 2020. https://doi.org/10.48550/arXiv.2011.04006
14. Bulatov A., Kuratov Y., Burtsev M. Recurrent Memory Transformer // Advances in Neural Information Processing Systems. 2022. Vol. 35. P. 11079-11091. https://doi.org/10.48550/arXiv.2207.06881

This work is licensed under a Creative Commons Attribution 4.0 International License.

Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.

All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.

Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.

By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.

RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.

Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.

The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.

We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.

Article Sidebar

Main Article Content

Abstract

Keywords:

Article Details

References

Most read articles by the same author(s)