На пути к созданию распараллеливающих компиляторов на вычислительные системы с распределенной памятью

Boris Yakovlevich Steinberg

doi:10.26907/1562-5419-2024-27-1-127-149

PDF (Русский)

Published: 08.04.2024

UDC 004.4’422

DOI: https://doi.org/10.26907/1562-5419-2024-27-1-127-149

Issue

Vol. 27 No. 1 (2024): Special issue «Scientific Services & Internet». Part 2

Boris Yakovlevich Steinberg

Southern Federal University

https://orcid.org/0000-0001-8146-0479

Abstract

The conditions for creating optimizing parallelizing compilers for computing systems with distributed memory are described. Target computing systems are microcircuits of the “supercomputer on a chip” type. Both optimizing program transformations specific to systems with distributed memory and those transformations that are needed both for computing systems with distributed memory and for computing systems with shared memory are presented. The issues of minimizing interprocessor transfers when parallelizing a recursive function are discussed. The main approach to creating such compilers is block-affine data placement in distributed memory with minimization of inter-processor transfers. It is shown that parallelizing compilers for computing systems with distributed memory should be created on the basis of a high-level internal representation and a high-level output language.

Keywords:

automatic parallelization, distributed memory, program transformation, data distribution, data interchange.

How to Cite

Steinberg , B. Y. “On the Way to Creating Parallelizing Compilers for Computing Systems With Distributed Memory”. Russian Digital Libraries Journal, vol. 27, no. 1, Apr. 2024, pp. 127-49, doi:10.26907/1562-5419-2024-27-1-127-149.

References

1. Bondhugula U. Automatic distributed-memory parallelization and codegeneration using the polyhedral framework, Technical report, ISc-CSA-TR-2011-3, 2011, 10 p.
2. Ammaev S.G., Gervich L.R., Steinberg B.Y. Combining parallelization with overlaps and optimization of cache memory usage. PaCT 2017: Parallel Computing Technologies, Lecture Notes in Computer Science. Vol. 10421. P. 257–264.
3. Векторизация программ // Векторизация программ: теория, методы, реализация / Сборник переводов статей. М.: Мир, 1991. С. 246–267.
4. Moldovanova O.V., Kurnosov M.G. Auto-Vectorization of Loops on Intel 64 and Intel Xeon Phi: Analysis and Evaluation International Conference on Parallel Computing Technologies. PaCT 2017: Parallel Computing Technologies, Lecture Notes in Computer Science. Vol. 10421. P. 143–150.
5. Nvidia compilers. URL: https://developer.nvidia.com/hpc-compiler
6. Peng Di, Ding Ye, Yu Su, Yulei Sui, Jingling Xue. Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs. 2012. 41st International Conference on Parallel Computing.
7. SoC Esperanto. URL: https://www.esperanto.ai/technology/
8. Процессор НТЦ «Модуль». URL: https://www.cnews.ru/news/top/2019-03-06_svet_uvidel_moshchnejshij_ rossijskij_nejroprotsessor (дата обр. 26.03.2022).
9. Peckham O. SambaNova. Launches Second-Gen DataScale System. URL: https://www.hpcwire.com/2022/09/14/sambanova-launches-second-gen-datascalesystem/.
10. Елизаров Г.С., Конотопцев В.Н., Корнеев В.В. Специализированные большие интегральные схемы для реализации нейросетевого вывода. XXII международная конференция «Харитоновские тематические научные чтения». Суперкомпьютерное моделирование и искусственный интеллект: труды / Редактор Р.М. Шагалиев. Саров: ФГУП «РФЯЦ-ВНИЭФ», 2022. С. 181–184.
11. Корнеев В.В. Направления повышения производительности нейросетевых вычислений // Программная инженерия. 2020. Т. 11, № 1. С. 21–25.
12. Yen I.E., Xiao Zh., Xu D. S4: a High-sparsity, High-performance AI Accelerator // arXiv:2207.08006v1 [cs.AR] 16 Jul 2022.
13. Gale T., Elsen E., Hooker S. The state of sparsity in deep neural networks // arXiv preprint arXiv:1902.09574, 2019.
14. Intelligence Processing Unit. URL: https://www.graphcore.ai/products/ipu.
15. Jia Zh., Tillman B., Maggioni M., Scarpazza D.P. Dissecting the Graphcore IPU Architecture via Microbenchmarking // Technical Report. December 7, 2019. arXiv:1912.03413v1 [cs.DC] 7 Dec 2019. 91 p.
16. DVM-система разработки параллельных программ. URL: http://dvm-system.org/ru/about/
17. Kataev N., Kolganov A. Additional Parallelization of Existing MPI Programs Using SAPFOR. In: Malyshkin V. (Ed.) Parallel Computing Technologies. PaCT 2021. Lecture Notes in Computer Science. 2021. Vol. 12942. Springer, Cham. URL: https://doi.org/10.1007/978-3-030-86359-3_4
18. Kwon D., Han S., Kim H. MPI backend for an automatic parallelizing compiler // Proceedings Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN’99). 06.1999. P. 152–157.
19. Корнеев В.В. Параллельное программирование // Программная инженерия. 2022. Т. 13, № 1. С. 3–16.
20. Прангишвили И.В., Виленкин С.Я., Медведев И.Л. Параллельные вычислительные системы с общим управлением. М.: Энергоатомиздат, 1983. 312 с.
21. Krivosheev N.M., Steinberg B.Ya. Algorithm for searching minimum inter-node data transfers // Procedia Computer Science, 10th International Young Scientist Conference on Computational Science, YSC 2021, 1–3 July 2021. P. 306–313.
22. Gervich L.R., Steinberg B.Ya. Automation of the Application of Data Distribution with Overlapping in Distributed Memory // Bulletin of the South Ural State University. Ser. Mathematical Modeling, Programming & Computer Software (Bulletin SUSU MMCS). 2023. Vol. 16, no. 1. P. 59–68.
23. Штейнберг Б.Я. Блочно-аффинные размещения данных в параллельной памяти // Информационные технологии. 2010. №6. С. 36–41.
24. Штейнберг Б.Я. Оптимизация размещения данных в параллельной памяти. Ростов-на-Дону, Изд-во Южного федерального университета, 2010. 255 с.
25. Gong Z., Chen Z., Szaday Z., Wong D., Sura Z., Watkinson N., Maleki S., Padua D., Veidenbaum A., Nicolau A. An empirical study of the effect of source-level loop transformations on compiler stability // Proceedings of the ACM on Programming Languages. 11.2018. P. 1–29.
26. Steinberg B.Ya., Steinberg O.B., Oganesyan P.A., Vasilenko A.A., Veselovskiy V.V., Zhivykh N.A. Fast Solvers for Systems of Linear Equations with Block-Band Matrices // East Asian Journal on Applied Mathematics. 2023. Vol. 13, No. 1. P. 47–58.
27. Vasilenko A., Veselovskiy V., Metelitsa E., Zhivykh N., Steinberg B., Steinberg O. Precompiler for the ACELAN-COMPOS Package Solvers // In: Malyshkin V. (Ed.) Parallel Computing Technologies. PaCT 2021. Lecture Notes in Computer Science. 2021. Vol. 12942. P. 103–116. Springer, Cham. URL: https://doi.org/10.1007/978-3-030-86359-3_8
28. Dordopulo A.I., Levin I.I., Gudkov V.A., Gulenok A.A. High-Level Synthesis of Scalable Solutions from C-Programs for Reconfigurable Computer Systems // In: Malyshkin V. (Ed.) Parallel Computing Technologies. PaCT 2021. Lecture Notes in Computer Science. 2021. Vol. 12942. Springer, Cham. URL: https://doi.org/10.1007/978-3-030-86359-3_7
29. Штейнберг Б.Я. Блочно рекуррентное размещение матрицы для параллельного выполнения алгоритма Флойда // Известия ВУЗов. Северокавказский регион. Естественные науки. 2010. №5. C. 31–33.
30. Штейнберг Б.Я., Штейнберг О.Б. Преобразования программ – фундаментальная основа создания оптимизирующих распараллеливающих компиляторов // Программные системы: теория и приложения. 2021. Т. 12, № 1(48). С. 21–113. URL: http://psta.psiras.ru/read/psta2021_1_21-113.pdf
31. Wolfe M. More Iteration Space Tiling // Supercomputing. Reno, 1989. P. 655–664.
32. Штейнберг Б.Я., Штейнберг О.Б., Василенко А.А. Слияние циклов для локализации данных // Программные системы. Теория и приложения. 2020. Т. 11, №3. С. 17–31. URL: https://doi.org/10.25209/2079-3316-2020-11-3-17-31
33. Козак А.В., Штейнберг Б.Я., Штейнберг О.Б. Алгоритм восстановления смазанного изображения, полученного вращающейся под углом к горизонту камерой // Компьютерная оптика. 2020. Т. 44, № 2. С. 229–235.
34. Burkhovetskiy V.V., Steinberg B.Ya. Parallelizing an Exact Algorithm for the Traveling Salesman Problem // Procedia Computer Science, 6-th International Young Scientist Conference on Computational Science, YSC 2017, Procedia Computer Science. 2017. Vol. 119. P. 97–102. URL: http://authors.elsevier.com/sd/article/S187705091732375X
35. Бурховецкий В.В., Штейнберг Б.Я. Стратегия использования крупных заданий при параллельном обходе дерева // Языки программирования и компиляторы. Труды Всероссийской научной конференции памяти А.Л. Фуксмана. 3–5 апреля 2017, Южный федеральный университет, г. Ростов-на-Дону: Изд-во Южного федерального университета. 2017. С. 66–70.
36. Zhiyuan Li, Yonghong Song. Automatic Tiling of Iterative Stencil Loops // ACM Transactions on Programming Languages and Systems. 2004. Vol. 26, No. 6. P. 975–1028.
37. Gervich L.R., Guda S.A., Dubrov D.V., Ibragimov R.A., Metelitsa E.A., Mikhailuts Y.M., Paterikin A.E., Petrenko V.V., Skapenko I.R., Steinberg B.Ya., Steinberg O.B., Yakovlev V.A., Yurushkin M.V. How OPS (Optimizing Parallelizing System) May be Useful for Clang // CEE-SECR ’2017, October 20–21, 2017, St.-Peterburg, Russian Federation. Proceedings of the 13th Central & Eastern European Software Engineering Conference in Russia ACM New York, NY, USA. 2017. URL: https://dl.acm.org/citation.cfm?id=3166094&picked=prox

This work is licensed under a Creative Commons Attribution 4.0 International License.

Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.

All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.

Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.

By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.

RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.

Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.

The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.

We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.

On the Way to Creating Parallelizing Compilers for Computing Systems with Distributed Memory

Abstract

Keywords:

References

Most read articles by the same author(s)

Article Sidebar

Main Article Content

Abstract

Keywords:

Article Details

References

Most read articles by the same author(s)