Font Size: a A A

Text-to-Image Synthesis Based On Semantic Correlation Mining

Posted on:2021-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2518306050470444Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In the era of big data,simply searching for the required images from a large number of im-age resources has been unable to meet the needs of people.How to create images that meet specific needs based on human language descriptions has attracted widespread attention.In recent years,with the rapid development of generative adversarial networks,significan-t breakthroughs have been made in the text-to-image synthesis task.However,since it is very difficult to parse sentence and bridge the semantic gap between sentence and image,text-to-image synthesis remains an open problem.There are two major challenges asso-ciated with the text-to-image synthesis task.One is visual realism,as generating rich yet detailed images using text with a limited number of words is difficult.The other is semantic consistency,as building the relationships between text semantics and visual features is prob-lematic.This paper has conducted a more in-depth study on the above issues,and the main research contents are as follows:(1)Aiming at the problem of visual differences between images generated by different sentences corresponding to the same image,this paper proposes a multi-sentence auxiliary generation model based on a dual attention mechanism.The previous methods only focus on paired sentence and image during the generation process,ignoring the one-to-many re-lationship between the sentence and the image,so that the model cannot learn the semantic correlation between multiple sentences corresponding to the same image,which leads to great visual differences between the images generated by sentences with the same seman-tic content.Therefore,in order to consider the semantic relationship between an image and multiple sentences simultaneously,this paper proposes a single-sentence generation and multi-sentence discrimination module,which uses a single target sentence to generate im-ages of different resolutions in the generation stage to ensure that the network learns the unique semantics of the target sentence itself.In the discriminating stage,multiple different sentences are used as conditions to discriminate to ensure the semantic consistency between the images generated by different sentences.Meanwhile,in order to generate more detail-s,this paper proposes a detail enhancement module based on dual attention mechanism to further obtain more fine-grained images.Experiments conducted on Oxford-102 and CUB datasets prove the effectiveness of the method.(2)In order to further improve the performance of the generative model,this paper pro-poses a progressive negative sample learning mechanism.Negative sample learning is an important means to accelerate the convergence of the algorithm and improve the model per-formance.Most methods have randomly selected an image as a negative sample,but it is too simple to play a role in assisting training,so this paper explores some retrieval-based negative sample selection mechanisms.Meanwhile,on this basis,a new progressive nega-tive sample training strategy is proposed,which divides the negative samples into different levels,and gradually increases the difficulty of the negative samples during training to im-prove the model performance and thus obtain higher-quality image.Experiments conducted on Oxford-102 and CUB datasets prove the effectiveness of the method.
Keywords/Search Tags:Generative Adversarial Networks, Attention Mechanism, Semantic Consistency, Negative Sample Learning, Progressive Training
PDF Full Text Request
Related items