Font Size: a A A

Image Captioning Technology Based On Joint Attention Mechanism

Posted on:2021-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2518306122974719Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,with the rapid development of artificial intelligence technology,the performance of image captioning has been greatly improved,especially the successful application of the encoder-decoder framework on this task,which solves the shortcomings of the single style and low accuracy of sentences generated by traditional methods.In the encoderdecoder framework,the decoder introduces the attention mechanism to mine the local features of the image,so that it can predict the corresponding words more accurately.However,the problems of the existing image captioning methods based on the attention mechanism are as follows: First,the model only uses the local information of a single image at a time step during the training,which is not conducive to the learning of the common characters of a visual object.Second,when visual objects are obscure or scarce in the training image,it is difficult for the model to accurately predict these visual objects.To tackle the above problems,this paper studies the joint attention mechanism to improve the recognition performance of visual objects in image captioning.Compared with the current image captioning algorithm,this paper has made innovations in three aspects: algorithm theory,algorithm structure and application value,which are summarized as follows:1.This paper proposed the concept of joint attention mechanism.Compared with the traditional single-sample attention mechanism,this mechanism can explore multiple image local areas at the same time,thereby improving the learning ability of visual objects.2.In terms of algorithm structure,the virtual LSTM unit is introduced.Multiple virtual LSTM units can receive multiple image area features at the same time and learn at the same time,thereby more accurately capturing the commonality of visual objects.3.In practical applications,this method can solve the problem of visual deviation in different domains,thus solving the problem of transfer learning in image captioning tasks to a certain extent,and saving the cost of sample labeling.In order to verify the effectiveness of the proposed method,we conducted a lot of experiments on the MSCOCO and Flickr30 K datasets.The experimental results show that our method significantly improves the B-1 and F-1 indicators,proving that joint attention mechanism improves the accuracy of visual objects recognition in image captioning.On the other hand,our method performs better on various evaluation metrics compared with the state-of-art method and solves the transfer learning problem in the field of image captioning to a certain extent.
Keywords/Search Tags:Attention mechanism, LSTM, Image captioning, Transfer learning
PDF Full Text Request
Related items