Design And Implementation Of Text-Guided Face Image Generation System

Posted on:2024-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Jin

Full Text:PDF

GTID:2568306944957099

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the advent of the 5G era,applications such as short video,live streaming,digital human and the metaverse are booming,and the key technology of which,Artificial Intelligence Generated Content(AIGC),has received widespread attention.The demands of personalized creation are increasing day by day.In order to meet the broad demands of users and stimulate their creativity,while improving content diversity and reducing production costs,the industry has extensively explored and applied text-guided image generation technology to achieve automatic generation.In the process of image generation,the cross-modal representations and visual-linguistic similarities are difficult to learn by the model,and there is a lack of latent code initialization strategy and optimization mechanism,resulting in low image generation quality,diversity and text relevance.Based on these problems,this paper conducts relevant research.Firstly,an image-text matching module based on multi-modal alignment and fusion is designed.The proposed module uses Transformer architecture to align single-modal embedding features and fuse multi-modal embedding features,thus enhancing the image-text matching ability of the model.This module can be used to calculate the similarity between a given image-text pair.It can be utilized not only as a supervision signal during training process of the model,but also as an objective evaluation indicator to evaluate image-text similarity.Secondly,a style-based image latent code initialization module and a latent code iterative optimization mechanism are proposed,and our StylBEF framework is designed.Using the style-based image generation model and its latent space,the latent code initialization strategy is designed to prevent the latent code from falling into the low-density representation areas of the latent code space,thus improving the image generation quality.Based on the initialization strategy,the iterative optimization mechanism of latent code is further designed to improve the diversity of generated images and the image-text similarity.Extensive experiments and comparisons demonstrate the effectiveness of our proposed method.Finally,a text-guided face image generation system is designed and implemented,which can be deployed simply on a single GPU.The system implements our proposed StylBEF model and baseline models,realizes the function that users provide text to automatically generate corresponding images,and allows users to customize hyperparameters to meet their diverse demands and assist users in intelligent creation.

Keywords/Search Tags:

text-to-image generation, multi-modal learning, generative adversarial network, latent code, intelligent creation

PDF Full Text Request

Related items

1	Research On Image Fusion Method Based On Latent Space And Generative Models
2	Research On Text Description Image Generation Based On Generative Adversarial Network
3	Dual-channel Consistency Constraint Generative Adversarial Network For Text-guided Image Generation
4	Research On Text-to-image Generation Based On Multi-stage And Multi-task Generative Adversarial Network
5	Latent Space Distribution Learning In Generative Adversarial Networks And Application
6	Research And Application Of Text-to-Image Technology Based On Multi-modal Pre-training
7	Research On Text-to-Image Synthesis Based On Generative Adversarial Network
8	Research On Cross Modal Text Generation Image Based On Generative Adversarial Network
9	Research On Cross-modal Image Generation Based On Generative Adversarial Network
10	Text To Image Generation Based On Generative Adversarial Network