Квантование Vision Transformer: CPU-центричный анализ компромисса между размером модели и скоростью инференса

Main Article Content

Амир Рамисович Нигматуллин
Рустам Арифович Лукманов
Ахмад Таха

Аннотация

Аа

Article Details

Как цитировать
Нигматуллин, А. Р., Р. А. Лукманов, и А. Таха. «Квантование Vision Transformer: CPU-центричный анализ компромисса между размером модели и скоростью инференса». Электронные библиотеки, т. 29, вып. 1, февраль 2026 г., сс. 262-86, doi:10.26907/1562-5419-2026-29-1-262-286.

Библиографические ссылки

1. Shamshad F., Khan S., Zamir S.W., et al. Transformers in Medical Imaging: A Survey // arXiv. 2022.
2. He K., Gan C., et al. Transformers in Medical Image Analysis: A Review // arXiv. 2022.
3. Atabansi C.C., Nie J., et al. A Survey of Transformer Applications for Histopathological Image Analysis: New Developments and Future Directions // Biomedical Engineering Online. 2023. Vol. 22, No. 1. https://doi.org/10.1186/s12938-023-01069-5
4. Azad R., Kazerouni A., Heidari M., et al. Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review // arXiv. 2023.
5. Shamshad F., Khan S., Zamir S.W., et al. Transformers in Medical Imaging: A Survey // Medical Image Analysis. 2024. Vol. 88. https://doi.org/10.1016/j.media.2023.102843
6. Liu Y., et al. A Recent Survey of Vision Transformers for Medical Image Segmentation // arXiv. 2023.
7. Wu F., et al. Lite Transformer with Long-Short Range Attention // Proceedings of the International Conference on Learning Representations (ICLR). 2020.
8. Jacob B., et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. P. 2704–2713. https://doi.org/10.1109/CVPR.2018.00286
9. Nagel M., et al. A White Paper on Neural Network Quantization // arXiv. 2021.
10. Han S., Mao H., Dally W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding // arXiv. 2016.
11. Yao Z., et al. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers // Advances in Neural Information Processing Systems (NeurIPS). 2022. Vol. 35.
12. Wikipedia contributors. Model Compression // Wikipedia. 2025.
13. Hinton G., Vinyals O., Dean J. Distilling the Knowledge in a Neural Network // arXiv. 2015.
14. Gou J., et al. Knowledge Distillation: A Survey // International Journal of Computer Vision. 2021. Vol. 129, No. 6. P. 1789–1819.https://doi.org/10.1007/s11263-021-01453-z
15. Umirzakova S., et al. Simplified Knowledge Distillation for Deep Neural Networks: Bridging the Performance Gap with a Novel Teacher–Student Architecture // Electronics. 2024. Vol. 13, No. 3. https://doi.org/10.3390/electronics13030512
16. Liang P., et al. Data-Free Knowledge Distillation with Feature Synthesis and Spatial Consistency for Image Analysis // Scientific Reports. 2024. Vol. 14, No. 1. https://doi.org/10.1038/s41598-024-53241-3


Наиболее читаемые статьи этого автора (авторов)