Font Size: a A A

Research On Conditional Image Synthesis Based On Image And Text

Posted on:2021-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z W ZhouFull Text:PDF
GTID:2428330614971177Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present,deep learning has been successfully applied in many fields such as image processing,video processing,text processing and speech processing.At the same time,the application of deep learning to image-text combination,video-speech combination and other multi-modal combination scenarios is still in the initial stage.Among them,the research results for conditional image synthesis in the combined scene of images and text are not outstanding,the existing research only includes the single task type of image editing based on natural language text description,and the image editing effect is not ideal.However,the future application prospects of this research field in intelligent interactive image processing and other fields are huge.In this paper,two models are designed based on two different tasks of conditional image synthesis combining image and text,aiming to enrich research content in this field through research and to improve the existing research deficiencies and enhance the experimental effects of existing methods.In order to solve the problems of limited research results and single task type in conditional image synthesis combining image and text,this paper first innovatively proposes a method of partial image synthesis using images and natural language text description,and defines this task as a partial image synthesis research based on natural language text description.The overall goal of this research is to input an image containing only part of the foreground information and a text description of the real image,to achieve the synthesis of the background area of the image,and finally to synthesize a visually complete image that meets the text description.Through the investigation and research of text-to-image synthesis and image inpainting,based on the existing work,this paper first proposes and implements the task of using a given text description to synthesize a reasonable background for an image that contains only a part of the foreground,which enriches the research content in the field of conditional image synthesis combining image and text.Secondly,in order to improve the research results in the field of conditional image synthesis combining images and text,the second research tasks in this paper focuses on image editing based on natural language text descriptions.The overall goal of this research is to input an original image and a target text description,and output an edited image,so that the output image meets the text description as a whole,while maintaining the details of the original image that are not related to the text description.In view of the problems in the existing research,such as low image editing accuracy and poor retention in text-independent areas,through the introduction of a text encoder pre-trained based on the attention mechanism,a well-designed decoding unit based on the attention mechanism,and reconstruction loss and deep attention multimodal similarity model loss function added during model training,this paper greatly improves the editing accuracy of existing methods and improves the deficiencies of existing methods in image editing effects and image retention in text-independent areas.In this paper,a large number of experiments were carried out for two different research tasks,and multiple sets of comparative experiments were constructed.The results show that,in qualitative and quantitative comparison with other methods,the models of partial image synthesis based on natural language text description and image editing based on natural language text description designed in this paper have good performance in their respective tasks.
Keywords/Search Tags:Deep Learning, Generative Adversarial Networks, Conditional Image Synthesis, Multimodality, Attention Mechanism
PDF Full Text Request
Related items