Font Size: a A A

Research On Image Caption Based On Attention Mechanism

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:N X LiaoFull Text:PDF
GTID:2428330626958581Subject:Information security
Abstract/Summary:PDF Full Text Request
Image caption is a task that combines computer vision and natural language processing.For given image,the algorithm is required to automatically generate understandable text based on the image content,which has strong practical value in the fields of image-assisted understanding and mutual search of images and texts.In recent years,Research on how to efficiently use image convolution features to generate better description sentences has become an important research direction for image description generation tasks.Based on the current image description generation method,this paper conducts related research from feature combination and the use of advanced semantic information:1)Image caption based on class activation mapping-attention mechanism.This paper introduces the class activation mapping mechanism into the current attentionbased image caption framework,and proposes an image caption framework based on the class activation mapping attention mechanism to achieve better semantic alignment of the convolutional features with the generating words.Different from the others using spatial features,the convolutional features are combined to obtain more suitable and accurate feature,which were calculated before the attention mechanism.Based on the current soft attention framework,a class activation mapping mechanism is introduced to use the class the activation mapping mechanism recombines the image convolution features obtained by the convolutional neural network.In the description generation part,in order to adapt the decoding module to the class-activated mapping mechanism algorithm,a double-layer LSTM network is used to take advantage of the global and local features of the image to improve the model's expressive ability effectively.The results on the MSCOCO,Flickr8 k,and Flickr30 k datasets indicted that the performance have been significantly improved with current mainstream models.Among them,the ResNet-50-based model trained on MSCOCO has improved by 7.3% on the Bleu-2 over the Soft-attend model.Compared with the m-RNN model,the Bleu-3 increased by 10.8%,and the Bleu-4 increased by 2.5% compared to the NIC model.2)Method for generating entity feature image description.Based on the image description generation framework based on the class activation mapping attention mechanism,an image description generation method for entity feature description is proposed.By mining the entity attribute annotations from a given image description sentence,the entity features are introduced into the current Compared with the method of directly using word vector clustering in the encoder and decoder framework,entity feature labeling has clearer semantic information and better performance.The results on MSCOCO,Flickr8 k and Flickr30 k datasets showed that the image convolution features obtained by annotations with clear semantic relationships have better results.The image convolution features with clear semantic information are obviously helpful for image description generation tasks and can be obtained as a whole.The semantic information of the interrelationship between the entitiesq in the image,and the related information of the objects in the image can be obtained from the details.Among them,the entity characterization-oriented model trained on MSCOCO has improved by 2.9% on the Bleu-1 compared to the Soft-attend model,and has increased by 10.5% on the Bleu-3 compared to the CAMA model,and has relatively improved on the Bleu-4.The CAMA model increased by 10.7%,the ROUGE_L increased by 3.9% compared to the CAMA model,and the CIDEr increased by 9.4% relative to the CAMA model.
Keywords/Search Tags:Image Caption, Attention Mechanism, Encoder-Decoder, Class Activation Mapping, Long-Term and Short-Term Memory Network
PDF Full Text Request
Related items