Font Size: a A A

Image Caption Research Based On Significant Attention

Posted on:2021-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:Q C ZhangFull Text:PDF
GTID:2428330611951358Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning,the Internet industry is increasingly close to the original heart of artificial intelligence.Among them,Computer Vision and Natural Language Processing help the machine to simulate human vision and semantic understanding,making deep learning outstanding in the application of artificial intelligence.As the fusion task of them,Image Caption has also been widely concerned.This task is mainly to input an image to the machine,and then enable it to recognize and understand the objects,object attributes and their relations,and generate a natural language descriptive text with correct semantics and grammar.Recently,the model based on deep learning with encoder-decoder framework has made a breakthrough in image caption.This paper proposes an image caption algorithm based on significant attention for this architecture,which mainly works as follows:(1)According to human visual habits,an encoder based on the faster-RCNN object detection framework is designed to focus on the salient areas of image and extract features,so as to overcome the shortcomings of a single convolutional neural network.The defect is that when extracting the weight of attention distribution,we can not balance the "thick" and "thin" position on the feature map,and ignore some very significant target areas.(2)A dual LSTM network decoder with adaptive attention mechanism is proposed,which overcomes the defects of simple structure and limited decoding expression ability of single-layer LSTM.The mapping is realized by sufficient depth and nonlinear transformation,and the integration of this two parts achieve the purpose of fusion and improvement.In this paper,the encoding method proposed aims at feature extraction of significant regions,the decoding method combines the adaptive attention and the Dual LSTM network,which further excavates attention resources and enhances language decoding ability.This significant attention mechanism improves the performance of the model.The proposed model and its variants were tested on the classic MS COCO dataset of image caption,compared with a variety of advanced mainstream models,and evaluated on the standard assessment indicators of the image caption such as BLEU,METOER,ROUGE,and CIDEr.The experimental results show that the image caption method based on significant attention is better than the other six mainstream methods.
Keywords/Search Tags:Image Caption, Attention Mechanism, Dual LSTM
PDF Full Text Request
Related items