Text to image synthesis using StackGAN

Warade, Purva Sandeep, Shinde, Swati, Virdee, Bal Singh and Khanna, Ashish (2025) Text to image synthesis using StackGAN. In: 2025 9th International Conference on Computing, Communication, Control and Automation (ICCCBEA), 22-23 August 2025, Pune, India.

Abstract

Text-to-image synthesis is a difficult task. It involves the translation of descriptive text to visual content, which maps language and image representations. Among several approaches, StackGAN was one of the most notable frameworks because of the pioneer two-stage architecture. The first stage generated low-resolution images based on the global structure. Then, the first stage results were utilized and refined in the second stage into high-resolution, realistic images with improved details. This paper reports the performance of StackGAN on a subset of the Flickr dataset where embeddings of textual descriptions are obtained from the USE. Experimental results will show that the model produces semantically coherent images, which are also visually coherent. A study focuses on the prospect of StackGAN in creative content generation and discusses challenges such as maintaining diversity and mitigating artifacts.

Documents
11156:55940
[thumbnail of StackGAN - accepted.pdf]
Preview
StackGAN - accepted.pdf - Accepted Version
Available under License Creative Commons Attribution 4.0.

Download (418kB) | Preview
Details
Record
View Item View Item