A Tool for Rapid Diagnostics of Memory in Neural Network Architectures of Language Models

Main Article Content

Pavel Andreevich Gavrikov
Azamat Komiljon ugli Usmanov
Dmitriy Revayev
Sergey Nikolaevich Buzykanov

Abstract

Large Language Models (LLMs) have evolved from simple n-gram systems to modern universal architectures; however, a key limitation remains the quadratic complexity of the self-attention mechanism with respect to input sequence length. This significantly increases memory consumption and computational costs, and with the emergence of tasks requiring extremely long contexts, creates the need for new architectural solutions. Since evaluating a proposed architecture typically requires long and expensive full-scale training, it is necessary to develop a tool that allows for a rapid preliminary assessment of a model’s internal memory capacity.


This paper presents a method for quantitative evaluation of the internal memory of neural network architectures based on synthetic tests that do not require large data corpora. Internal memory is defined as the amount of information a model can reproduce without direct access to its original inputs.


To validate the approach, a software framework was developed and tested on the GPT-2 and Mamba architectures. The experiments employed copy, inversion, and associative retrieval tasks. Comparison of prediction accuracy, error distribution, and computational cost enables a fast assessment of the efficiency and potential of various LLM architectures.

Article Details

How to Cite
Gavrikov, P. A., A. K. ugli Usmanov, D. Revayev, and S. N. Buzykanov. “A Tool for Rapid Diagnostics of Memory in Neural Network Architectures of Language Models”. Russian Digital Libraries Journal, vol. 28, no. 6, Dec. 2025, pp. 1346-67, doi:10.26907/1562-5419-2025-28-6-1346-1367.

References

1. Kaplan J., McCandlish S., Henighan T., et al. Scaling Laws for Neural Language Models // arXiv preprint arXiv:2001.08361. 2020. https://doi.org/10.48550/arXiv.2001.08361
2. Brown T., Mann B., Ryder N., et al. Language Models are Few‑Shot Learners // Advances in Neural Information Processing Systems. 2020. Vol. 33. P. 1877-1901. https://doi.org/10.5555/3495724.3495883
3. Beltagy I., Peters M. E., Cohan A. Longformer: The Long‑Document Transformer // arXiv preprint arXiv:2004.05150. 2020. https://doi.org/10.48550/arXiv.2004.05150
4. Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I. Language Models are Unsupervised Multitask Learners // OpenAI. 2019.
5. Common Crawl Foundation. Common Crawl dataset. https://commoncrawl.org
6. Gu A., Goel K., Ré C. Efficiently Modeling Long Sequences with Structured State Spaces // International Conference on Learning Representations (ICLR). 2022.
7. Gao L., Biderman S., Black S., et al. The Pile: An 800 GB Dataset of Diverse Text for Language Modeling // arXiv preprint arXiv:2101.00027. 2020. https://doi.org/10.48550/arXiv.2101.00027
8. Eldan R., Li Y. TinyStories: How Small Can Language Models Be and Still Speak Coherent English? // arXiv preprint arXiv:2305.07759. 2023. https://doi.org/10.48550/arXiv.2305.07759
9. Dao T. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness // Advances in Neural Information Processing Systems (NeurIPS). 2022. Vol. 35. P. 16344-16359. https://doi.org/10.48550/arXiv.2205.14135
10. Gu A., Goel K., Dao T., et al. Mamba: Linear-Time Sequence Modeling with Selective State Spaces // International Conference on Learning Representations (ICLR). 2024.
11. Kwon W., Lee S., Li S., Zaharia M., Zhang H., Stoica I., Sheng Y., Crichton W., Xie S., Gonzalez J. Efficient Memory Management for Large Language Model Inference with KV-Caching // arXiv preprint arXiv:2309.06180. 2023. https://doi.org/10.48550/arXiv.2309.06180
12. Vaswani A., Shazeer N., Parmar N., et al. Attention Is All You Need // Advances in Neural Information Processing Systems (NIPS). 2017. Vol. 30. P. 5998–6008. https://doi.org/10.5555/3295222.3295349
13. Tay Y., Bahri D., Metzler D., et al. Long Range Arena: A Benchmark for Efficient Transformers // arXiv preprint arXiv:2011.04006. 2020. https://doi.org/10.48550/arXiv.2011.04006
14. Bulatov A., Kuratov Y., Burtsev M. Recurrent Memory Transformer // Advances in Neural Information Processing Systems. 2022. Vol. 35. P. 11079-11091. https://doi.org/10.48550/arXiv.2207.06881