Нейронная сеть для генерации изображений на основе текста песен с применением моделей OpenAI и CLIP

Alsu Rishatovna Davletgareeva; Ksenia Aleksandrovna Edkova

doi:10.26907/1562-5419-2023-26-4-437-455

PDF (Русский)

Published: 28.09.2023

UDC 004.8

DOI: https://doi.org/10.26907/1562-5419-2023-26-4-437-455

Issue

Vol. 26 No. 4 (2023)

Alsu Rishatovna Davletgareeva

Kazan (Volga region) Federal University

https://orcid.org/0009-0008-7258-470X

Ksenia Aleksandrovna Edkova

Kazan (Volga region) Federal University

https://orcid.org/0009-0005-4706-2254

Abstract

The effectiveness of the ImageNet diffusion model and CLIP models for image generation based on textual descriptions was investigated. Two experiments were conducted using various textual inputs and different parameters to determine the optimal settings for generating images from text descriptions. The results showed that while ImageNet performed well in generating images, CLIP demonstrated better alignment between textual prompts and relevant images. The obtained results highlight the high potential of combining these mentioned models for creating high-quality and contextually relevant images based on textual descriptions.

Keywords:

image generation, artificial intelligence, ImageNet diffusion model, CLIP, deep learning, neural networks, natural language processing.

How to Cite

Davletgareeva, A. R., and K. A. Edkova. “Neural Network for Generating Images Based on Song Lyrics Using OpenAI and CLIP Models”. Russian Digital Libraries Journal, vol. 26, no. 4, Sept. 2023, pp. 437-55, doi:10.26907/1562-5419-2023-26-4-437-455.

References

1. Elasri M., Elharrouss O., Al-Maadeed S., Tairi H. Image Generation: A Review // Neural Processing Letters. 2022. Vol. 54. No. 5. P. 4609–4646.
2. Zhang H., Song H., Li S., Zhou M., Song D. A survey of controllable text generation using transformer-based pre-trained language models // arXiv preprint arXiv:2201.05337. 2022
3. Основы генеративно-состязательных сетей. URL: https://habr.com/ru/articles/726254/
4. Brown T., Mann B., Ryder N., Subbiah M., Kaplan J. D., Dhariwal P., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell. A, Agarwal S., Herbert-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D.M., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I., Amodei D. Language models are few-shot learners // Advances in neural information processing systems. 2020. Vol. 33. P. 1877–1901.
5. DALL⋅E 2. URL:https://openai.com/product/dall-e-2.
6. How AI is Transforming Text-to-Image Generation. URL: https://nesesho.com/index.php/2023/04/12/how-ai-is-transforming-text-to- image-generation/
7. OpenAI⋅GitHub. URL: https://github.com/openai.
8. Gulrajani I., Ahmed F., Arjovsky M., Dumoulin V., Courville A.C. Improved training of wasserstein GANs // Advances in neural information processing systems. 2017. Vol. 30. P. 5767–5777.
9. Indolia S., Goswami A.K., Mishra S.P., Asopa P. Conceptual understanding of convolutional neural network-a deep learning approach // Procedia computer science. 2018. Vol. 132. P. 679–688.
10. Laudani A., Lozito G.M., Fulginei F.R., Salvini A. On training efficiency and computational costs of a feed forward neural network: a review // Computational intelligence and neuroscience. 2015. P. 83–83.
11. CLIP. URL: https://github.com/openai/CLIP.
12. Dhariwal P., Nichol A. Diffusion models beat gans on image synthesis // Advances in Neural Information Processing Systems. 2021. Vol. 34. P. 8780–8794.
13. Kim G., Kwon T., Ye J.C. Diffusionclip: Text-guided diffusion models for robust image manipulation // In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. P. 2426–2435.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.

All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.

Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.

By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.

RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.

Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.

The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.

We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.

Neural Network for Generating Images Based on Song Lyrics using OpenAI and CLIP Models

Abstract

Keywords:

References

Most read articles by the same author(s)

Article Sidebar

Main Article Content

Abstract

Keywords:

Article Details

References

Most read articles by the same author(s)