Font Size: a A A

Design And Implementation Of Text-Guided Face Image Generation System

Posted on:2024-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y M JinFull Text:PDF
GTID:2568306944957099Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the 5G era,applications such as short video,live streaming,digital human and the metaverse are booming,and the key technology of which,Artificial Intelligence Generated Content(AIGC),has received widespread attention.The demands of personalized creation are increasing day by day.In order to meet the broad demands of users and stimulate their creativity,while improving content diversity and reducing production costs,the industry has extensively explored and applied text-guided image generation technology to achieve automatic generation.In the process of image generation,the cross-modal representations and visual-linguistic similarities are difficult to learn by the model,and there is a lack of latent code initialization strategy and optimization mechanism,resulting in low image generation quality,diversity and text relevance.Based on these problems,this paper conducts relevant research.Firstly,an image-text matching module based on multi-modal alignment and fusion is designed.The proposed module uses Transformer architecture to align single-modal embedding features and fuse multi-modal embedding features,thus enhancing the image-text matching ability of the model.This module can be used to calculate the similarity between a given image-text pair.It can be utilized not only as a supervision signal during training process of the model,but also as an objective evaluation indicator to evaluate image-text similarity.Secondly,a style-based image latent code initialization module and a latent code iterative optimization mechanism are proposed,and our StylBEF framework is designed.Using the style-based image generation model and its latent space,the latent code initialization strategy is designed to prevent the latent code from falling into the low-density representation areas of the latent code space,thus improving the image generation quality.Based on the initialization strategy,the iterative optimization mechanism of latent code is further designed to improve the diversity of generated images and the image-text similarity.Extensive experiments and comparisons demonstrate the effectiveness of our proposed method.Finally,a text-guided face image generation system is designed and implemented,which can be deployed simply on a single GPU.The system implements our proposed StylBEF model and baseline models,realizes the function that users provide text to automatically generate corresponding images,and allows users to customize hyperparameters to meet their diverse demands and assist users in intelligent creation.
Keywords/Search Tags:text-to-image generation, multi-modal learning, generative adversarial network, latent code, intelligent creation
PDF Full Text Request
Related items