Text-to-image Synthesis Based On Feature Fusion And Augmentation

Posted on:2024-06-29

Degree:Master

Type:Thesis

Country:China

Candidate:B L Liu

Full Text:PDF

GTID:2568307103475304

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Text-to-Image synthesis is a task of cross-modal representation that aims to generate photo-realistic images from text descriptions.It involves research in both natural language processing and computer vision and has great potential for application in a variety of real-world scenarios,making it of significant research value.The mainstream model in this field currently uses Generative Adversarial Networks to realize text-to-image synthesis,and the generated image effect is excellent.However,there are still some problems with this approach.Firstly,the commonly used multi-stage structure of the intermediate stage network has repetitive work,and the network is not used efficiently to focus on image detail refinement.Secondly,the existing text-to-image synthesis uses a limited amount of training data,and the semantic space learned by the model is not accurate enough to guarantee the quality of each image generation.To address these problems,the following research is conducted in this thesis.To address problem 1,the thesis proposes a multi-path text-to-image synthesis structure that is based on feature fusion.This method aims to establish an efficient feature fusion mechanism on the multi-stage text-to-image synthesis task to improve the quality of generated images.The proposed multi-path structure mainly has two components: staged residual connection and multi-scale module.Staged residual connection is employed to transfer the feature maps of the generated image from the previous stage to the end of current stage.This path can avoid the requirement of long-term memory and guide the network focus on modifying and enhancing the details of the generated image.The multi-scale module is explored to extract feature at different spatial levels and adaptively integrates these feature maps from different spatial levels through channel attention mechanism.This process generates images with richer and finer details.The suggested multi-path method can serve as a common framework for multiple multi-stage models aimed at generating highly intricate images.Regarding problem 2,this thesis proposes a method for text-to-image synthesis based on semantic data augmentation.To address the issue of insufficient model training data,this approach utilizes a loss mechanism that incorporates semantic data augmentation and a module for aligning semantic information between text and images.The loss mechanism acquires an upper bound loss that can be computed probabilistically utilizing semantic data expansion,avoiding the explicit expansion of a large sample set.In addition,a text-image alignment module based on contrastive learning is incorporated into the model to further improve the consistency of semantic information between text and generated images.The proposed method has undergone extensive experimentation using the existing datasets of CUB-200 and COCO.The results of the experiments demonstrate that the proposed method effectively utilizes the feature information available in images and text,leading to the significant improvement of the quality of images generated by the text-to-image model.

Keywords/Search Tags:

Text-to-Image Synthesis, Generative Adversarial Network, Semantic Data Augmentation, Multi-scale Mechanism, Contrastive Learning

PDF Full Text Request

Related items

1	Research On Text-to-image Synthesis Algorithm Based On Generating Adversarial Network
2	Research On Image Dehazing Algorithm Based On Generative Adversarial Network
3	Research On Text Images Based On Generative Adversarial Network
4	Research On Text-to-Image Synthesis Based On Generative Adversarial Network
5	Research And Application Of Text To Image Algorithm Based On Generative Adversarial Networks
6	Research On Image Data Augmentation Based On Generative Adversarial Network
7	Generative Adversarial Network For Text-to-Image Synthesis
8	Research And Application Of Image Data Augmentation Technology Based On Generative Adversarial Networks
9	Text-to-Image Synthesis Based On Semantic Correlation Mining
10	Research On Face Image Synthesis Based On Multi-task Enhanced Generative Adversarial Networks