Text to image synthesis using StackGAN

Warade, Purva Sandeep; Shinde, Swati; Virdee, Bal Singh; Khanna, Ashish

London Met Repository

Tools

Lists

Warade, Purva Sandeep, Shinde, Swati, Virdee, Bal Singh and Khanna, Ashish (2025) Text to image synthesis using StackGAN. In: 2025 9th International Conference on Computing, Communication, Control and Automation (ICCCBEA), 22-23 August 2025, Pune, India.

Abstract
Documents
Details
Record

[+][-]

Abstract

Text-to-image synthesis is a difficult task. It involves the translation of descriptive text to visual content, which maps language and image representations. Among several approaches, StackGAN was one of the most notable frameworks because of the pioneer two-stage architecture. The first stage generated low-resolution images based on the global structure. Then, the first stage results were utilized and refined in the second stage into high-resolution, realistic images with improved details. This paper reports the performance of StackGAN on a subset of the Flickr dataset where embeddings of textual descriptions are obtained from the USE. Experimental results will show that the model produces semantically coherent images, which are also visually coherent. A study focuses on the prospect of StackGAN in creative content generation and discusses challenges such as maintaining diversity and mitigating artifacts.

Documents