Font Size: a A A

Research Of Image Caption Based On Attention Mechanism

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:S QiaoFull Text:PDF
GTID:2428330620461345Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image Caption combines both the Computer Vision and the Natural Language Processing technology.It converts the input image into text that describes the content of the image,and achieves the models conversion from vision to language.Image Caption has board application prospects in image retrieval,human-computer interaction,and children's education,etc..The generation of image caption relies on the thorough understanding of the image content.So,the corresponding model needs to identify not only the objects,but also the image information,such as the background,the motion,the attributes,and the semantic relationships between objects.Traditional Image Caption based on templates or retrieval,and deeply relies on the templates or the existed text descriptions.So the result sentences are simple and similar.With the development of deep neural networks,the Encoder-Decoder framework based on deep learning has achieved great success in the field of Image Caption.However,there exist problems such as high error and poor quality in existing methods.Based on this,this thesis focuses on the image caption based on deep learning.The main task is as follows:(1)Propose an image caption generation method based on Attention Mechanism and Bidirectional Long Short-Term Memory Network.Aiming at the drawback of existing methods,as when generating the word at current moment,the LSTM decoder only relies on the previous information,thus it can be difficult to generate an accurate image description.To address this issue,this thesis proposes an improved model.The image caption method of improved model use VGGNet19 to extract the features of the image and use Attention Mechanism to calculate the weight of each image area at each moment.Then the image context vector is obtained by the weighted sum and input to Bi-LSTM.The decoder fully uses the context information to generate a more accurate description of the image when decoding.Compared with the benchmark model,the scores of BLEU-1,BLEU-2,BLEU-3,BLEU-4and METEOR on MSCOCO dataset were improved by 3.11%,6.09%,6.98%,7.41% and7.53% respectively.The experimental results show that using Bi-LSTM in decoding can effectively improve the performance of the model.(2)Propose an image caption generation method based on image features and text features.Aiming at the drawback of existing methods,as when using LSTM to calculate the probability of output words at current moment,it depends on the previous generated word information.When the predicted word is inaccurate,the entire output sentence will deviate from the real content of the image.To address this issue,this thesis proposes another improved model.The image caption method of the improved model uses TF-IDF and Word2 Vec to convert the manually labeled sentences of images into text features vectors.And extracts the image feature vector through VGGNet19.Meanwhile,we use Attention Mechanism to calculate both the image context vector and the text context vector respectively,then input them to LSTM.When predicting the output words,the image information and text information will be integrated effectively to reduce the incorrect words,generate higher accuracy sentences,and make the image caption be more closing to the meaning of the image.Compared with the benchmark model,the scores of BLEU-1,BLEU-2,BLEU-3,BLEU-4and METEOR on MSCOCO dataset were improved by 4.10%,5.49%,8.14%,9.47% and6.28% respectively.The experimental results show that using text features in image caption can effectively improve the performance of the model.
Keywords/Search Tags:Image Caption, Attention Mechanism, Bi-LSTM, image feature, text feature
PDF Full Text Request
Related items