Font Size: a A A

Research On Text To Image Generation Algorithm Based On Attention Mechanism And Generative Adversarial Networks

Posted on:2022-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HuangFull Text:PDF
GTID:2518306575965909Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatic image generation based on natural language description has a wide range of applications.For example,associating picture based on text,automatically matching image to text,and image compression through text storage instead of image storage.At the same time,this research can effectively promote the research progress of multimodal learning and reasoning across vision and language.In the field of medicine,the generation of medical case images from diagnostic reports can provide doctors with reference for disease diagnosis,and also solve the problem of difficult query of past case images caused by the regular deletion of data in hospitals to a certain extent,so as to save storage resources for hospitals.At present,all the researches on text to image generation are based on natural images.Because natural images pursue the richness of content,the existing methods only have weak constraints on the content of generated images,which leads to the content of generated images is easily deformed.This kind of deformation is not acceptable in the medical field which has high requirements for image content quality.Therefore,this thesis proposes a generative adversarial network based on content retention and attention mechanism to keep the authenticity of the content and texture details of the generated medical images.At the same time,most of the current text-toimage algorithms are limited by the size of the model.In order to improve the quality of the generated image without increasing the model parameters,how to better mine and use text features has become one of the current research focuses in this direction.Therefore,this thesis is based on the generative adversarial network,combining the attention mechanism and the dense block to improve the utilization of text features and attention features.And it can improve the quality of the generated image.The research work of this thesis is as follows:1.The text generation method based on natural images uses weak constraints on the generated images,which makes the generated images prone to deformation.In order to optimize the quality of the generated image and make it generate more realistic image content and details,and better adapt to the high requirements of medical images.A generative adversarial network based on content preservation and attention mechanism is proposed.The attention mechanism makes the content of the generated image more consistent with the description of the diagnosis report.The Content preserving loss optimizes the texture details of the image from both shallow and deep features to make the pathological details of the image more realistic.Experiments were performed on a ultrasound dataset and a X-Ray dataset.On the ultrasound dataset,the best method of GAN-test increased by 10.88%,and GAN-train increased by 20%.On the Open-i dataset the best method of GAN-test increased by 24%,and the GAN-train increased by 11.58%.2.In order to further utilize the text features and attention features to make the content of the generated image more consistent with the text description,and improve the content details of the generated image,a text-to-image generative adversarial network based on the dense block and attention mechanism is proposed.Each layer of the model can iteratively complete the image through shallow features and depth features of text and attention features.Subjective and objective evaluation experiments were carried out on CUB dataset and COCO dataset.On the CUB dataset,the IS score increased by 2.06%compared to Attn GAN,and the FID score increased by 25.23% compared to the best method.On the COCO dataset,the IS score increased by 18.67% compared to the best method,and the FID score by 5.84% compared to the best method.3.Two simulation systems of text-to-image on medical and natural images are designed and implemented respectively.Users can choose different text-to-image system according to different image types.After the user sets the super parameters,the system automatically outputs the corresponding image.On medical images,it supports the generation of ultrasound images of liver,gallbladder and kidney and X-ray images of lung.On natural images,it supports the generation of bird image and complex natural scene image.Both systems provides two easy to expand platforms for future research.
Keywords/Search Tags:text-to-image, generative adversarial network, attention mechanism, content preservation, densblcok
PDF Full Text Request
Related items