Font Size: a A A

Research And Application Of Image Description Generation Algorithm Based On Deep Learning

Posted on:2022-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WangFull Text:PDF
GTID:2518306530980649Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Image caption generation perfectly integrates the Computer Vision(CV)technology field and the Natural Language Processing(NLP)technology field.It has wide application in the fields of blind navigation,blind socialization,human-computer interaction,AI art creation,image retrieval and children's education.Different from tasks such as object detection,image classification,image caption generation implements the use of sentences that conform to human natural language habits to describe the image,this not only requires the model to distinguish the entities in the image,but also requires it to recognize other semantic information,such as the actions and inherent attributes of the entities,and understand the relationship between the entities,the entity and the environment.With the continuous improvement and development of deep learning technology and deep learning computing framework,the Encoder-Decoder model based on deep learning has achieved good results in the problem of image caption generation,but the model simply maps the image and text to the same vector space,the semantic gap between image and natural language is directly ignored.This thesis focuses on the research of image caption generation algorithm based on deep learning.The main work is:1.Creatively transform the end-to-end image caption generation problem into a Seq2 Seq problem.Although the Encoder-Decoder model architecture has become the mainstream to solve the problem of image caption generation,they can only describe images in a black box,which is difficult to control from the outside,and there is a great lack of controllable ways for the decoders to generate image caption.In response to this problem,this thesis uses the idea of machine translation,based on the idea of the original Encoder-Decoder model architecture,uses Spacy to extract a sequence of physical blocks from the image caption as a control signal to assist and guide the Long Short Term Memory(LSTM)network to generate image caption."Whitening" the problem of image caption generation,narrowing the semantic gap between images and natural language,this approach makes the process of image captions generation controllable.2.A new image caption generation method based on block sentinel and improved adaptive attention mechanism is proposed.Currently,most image caption generation models use decoders with too simple structures,making it difficult for the models to translate high-quality image caption.To solve this problem,based on the existing image caption generation algorithm with adaptive attention mechanism,this thesis proposes an improved programme of image caption generation model based on deep learning.The model takes the image entity block sequence as the control signal.at the same time,a block sentinel that controls the switching of physical blocks is designed,and double layer LSTM with an improved adaptive attention mechanism is introduced as a generator of image caption.Experiments show that on the MSCOCO and Flickr30 k datasets,the model in this thesis is superior to the current mainstream image caption generation methods in terms of generating controllable image captions,the image captions quality,and diversity.3.Introduce the idea of reinforcement learning to solve the problem of "exposure bias" and the mismatch between the training target of the model and the evaluation index,and further optimize the expression effect of the image caption generation model.First,Baseline uses Cross-Entropy(CE)loss to train and "early stop" the model,and then directly optimizes the Cider indicator to further train the model.The experimental results on the MSCOCO and Flickr30 k datasets prove that this method Can significantly improve the expression effect of the model.
Keywords/Search Tags:Deep learning, Image caption generation, Seq2Seq model, Control signal, Attention mechanism, Sentinel mechanism
PDF Full Text Request
Related items