• Main Navigation
  • Main Content
  • Sidebar

Russian Digital Libraries Journal

  • Home
  • About
    • About the Journal
    • Aims and Scopes
    • Themes
    • Editor-in-Chief
    • Editorial Team
    • Submissions
    • Open Access Statement
    • Privacy Statement
    • Contact
  • Current
  • Archives
  • Register
  • Login
  • Search
Published since 1998
ISSN 1562-5419
16+
Language
  • Русский
  • English

Search

Advanced filters

Search Results

Exploring Post-Training Quantization of Large Language Models with a Focus on Russian Evaluation

Dmitrii Romanovich Poimanov, Mikhail Sergeevich Shutov
1138-1163
Abstract:

The rapid adoption of large language models (LLMs) has made quantization a central technique for enabling efficient deployment under real-world hardware and memory constraints. While English-centric evaluations of low-bit quantization are increasingly available, much less is known about its effects on morphologically rich and resource-diverse languages such as Russian. This gap is particularly important given the recent emergence of high-performing Russian and multilingual LLMs. In this work, we conduct a systematic study of 2-, 3-, and 4-bit post-training quantization (PTQ) for state-of-the-art Russian LLMs across different model scales (4B and 32B). Our experimental setup covers both standard uniform quantization and specialized low-bit formats, as well as lightweight finetuning for recovery in the most extreme 2-bit setting. Our findings highlight several important trends: (i) the tolerance of Russian LLMs to quantization differs across model families and scales; (ii) 4-bit quantization is generally robust, especially when advanced formats are used; (iii) 3-bit models expose sensitivity to calibration data and scaling strategies; and (iv) 2-bit models, while severely degraded under naive PTQ, can be partially restored through short finetuning. Empirical results show that the model's domain must be considered when using different quantization techniques.

Keywords: neural networks quantization, compression and optimization of large language models.
1 - 1 of 1 items
Information
  • For Readers
  • For Authors
  • For Librarians
Make a Submission
Current Issue
  • Atom logo
  • RSS2 logo
  • RSS1 logo

Russian Digital Libraries Journal

ISSN 1562-5419

Information

  • About the Journal
  • Aims and Scopes
  • Themes
  • Author Guidelines
  • Submissions
  • Privacy Statement
  • Contact
  • eLIBRARY.RU
  • dblp computer science bibliography

Send a manuscript

Authors need to register with the journal prior to submitting or, if already registered, can simply log in and begin the five-step process.

Make a Submission
About this Publishing System

© 2015-2026 Kazan Federal University; Institute of the Information Society