Font Size: a A A

Research On Text Generation Of An Image Based On Generative Adversarial Networks

Posted on:2021-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:M F LiuFull Text:PDF
GTID:2428330620964158Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of deep learning,breakthroughs in hardware technology,applications based on artificial intelligence can be seen in many fields,and a large number of researchers have developed a strong interest in this field.For example,face recognition,face generation,target detection and tracking,scene segmentation,automatic driving,pedestrian recognition,speech recognition and so on.And Image understanding is a comprehensive area.It not only needs the natural language processing technology,but also needs the computer vision to realize image processing technology.Specifically,the technique involves processing both image data and text data.Image captioning algorithm not only uses computer vision method to extract image features and their correlation,but also needs to generate text to describe them.More importantly,the model must be able to capture the semantic content of the image and generate human-understandable descriptions.With the development of machine translation and big data,there has been a strong interest in the study of image understanding.At present,image captioning methods are mainly based on encoder-decoder model.The encoder is usually a convolutional neural network,which is used to extract image features.The decoder is constructed by a recurrent neural network,which generates sentences.The main work of this paper based on image captioning is as follows:(1)Since the gradient vanishing problem exists in RNN,a special structure LSTM is obtained by RNN,which has a long-term memory and can solve the gradient vanishing problem to a certain extent.Therefore,the method of long short term memory network(LSTM)is used in the decoding part in this paper.However,the sentences generated based on this method are often too rigid and lack of variability.This is because the existing image capposit technique is mainly used to train the model through the method of maximum likelihood estimation,that is,to maximize the possibility of sample occurrence.(2)According to the above methods,the generated text always lacks diversity expression,and this paper also aims to improve the naturalness and diversity of text description.Specifically,a framework based on generative adversarial network is used,which combines a generator to generate image-based descriptions and an evaluator toevaluate how well the description matches the image content.The policy through a policy gradient method to overcome the emergent against network cannot to back propagation characteristics of discrete samples,using gradient strategy makes the generator can accept the feedback from the early training,using the monte carlo algorithm,generate complete sentences,using the method can obviously increase the diverse descriptions of images.(3)Finally,this paper also combines the generative adversarial networks with unsupervised learning method,which can generate sentences using generative adversarial networks when there is no paired image-text,and use visual detector to guide sentences to generate sentences related to pictures.
Keywords/Search Tags:generative adversarial networks, unsupervised learning, monte carlo algorithm, image captioning
PDF Full Text Request
Related items