Font Size: a A A

Research On Image Caption Based On Attention Mechanism

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y S TaoFull Text:PDF
GTID:2428330626955619Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Image caption task is an interdisciplinary research problem,and it is the exploration of deep learning technology to expand to multi-data domain after it has made outstanding achievements in natural language processing,speech recognition,computer vision and other fields.The process of image caption task is to automatically generate caption text for an input image.The classical image caption task structure focuses on the information of all images and ignores the fine entity.On this basis,the attention mechanism is applied to the image caption task.The attention mechanism can filter the image information,assign the weight,and provide the most relevant image features for each generated word.Based on the attention mechanism and image caption task,the thesis mainly includes two aspects: bidirectional image caption and dual channel image caption.A network structure of bidirectional attention mechanism image caption is proposed.When forecasting,the network also refers to the context information.At the same time,the effects of the improved more logical dual attention mechanism on the image caption are studied.In most image caption tasks,the generation of each word depends on all the information of the image and the above information,which includes a lot of irrelevant image information and does not learn the following information.To solve this problem,a bidirectional double attention network is proposed,which consists of Bidirectional Long and Short Term Memory Network(Bi-LSTM),double attention mechanism and Convolutional Neural Network(CNN).BLSTM can obtain the above and the following information at the same time.The double attention mechanism is more logical than the ordinary attention mechanism for image screening.Firstly,the CNN is used to extract image features for the bidirectional double attention network,and then the image features are input into the Bi-LSTM with the double attention mechanism.At the same time,the significant information and hidden layer state of the forward and backward images are obtained,and the image caption is generated.The results show that the accuracy of the model is improved compared with the model with attention mechanism and the model with Bi-LSTM.We propose a two-channel image caption network structure,introduce the knowledge enhancement method into the image caption network with attention mechanism,and design a separate channel for it to calculate the influence of parameters on the image caption method.In the image caption method,only the image is used as the input in the information input.In the end-to-end training process,the internal parameter change is difficult to obtain,which is likely to cause errors.In order to further reduce the uncertainty of image caption,the knowledge enhancement method is applied in the image caption task.The subject information in the image is input at the input port,and the scope of image caption is determined.The proposed two-channel image caption architecture consists of two parts: theme channel and image channel.The semantic information is extracted by the theme channel,and the semantic information is used as the theme information to enhance the knowledge of the image information.Image channel achieves the classic image caption task function.The two channels are encoded and extracted by Faster Regional Neural Network,the attention mechanism is used to screen the features,and the Long and Short Term Memory Network is used to decode and predict the information.Finally,Long and Short Term Memory Network is used to synthesize the information of the two channels to realize the main channel to enhance the knowledge of the image channel and generate caption.Compared with the general image caption methods,the results show that the accuracy is improved.
Keywords/Search Tags:Dual attention mechanism, Knowledge enhancement, Two channels, Convolutional Neural Network, Long and Short Term Memory network
PDF Full Text Request
Related items