Font Size: a A A

Research On The Generation Method Of Chinese Image Description Based On Dual Attention Mechanism

Posted on:2020-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2438330626464277Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Image description combines the two directions of computer vision and natural language processing.It is a typical representative of solving multi-mode and cross domain problems with artificial intelligence algorithm.At present,most of the research in this field is based on the neural network description model of encoder decoder architecture,which has the problems of single description,low accuracy and inconsistent image content.In addition,because most of the open source datasets are in English,the research on image description is mainly in English.Chinese description is usually flexible in syntax and morphology,and it is difficult to realize the algorithm.Only a few people study the Chinese description of image.Although the existing research has achieved good results,there are still some problems,such as: there is a certain deviation between the description sentences generated by the model and the content of the image expression,the accuracy of the generated image scene description is low,and the language is monotonous.To solve these problems,this paper proposes a method of Chinese description generation based on double attention mechanism.Based on the NIC model,this paper uses the Inception-v4 network as the encoder and the LSTM network as the decoder,and verifies the performance of the visual attention based image description generation model on Flickr8k-CN and Flickr30k-CN Chinese image description data set is better than the NIC model through experiments.Aiming at NIC The model and the Chinese description model of image based on visual attention proposed in this paper still have the problems of low accuracy and monotonous language of the generated image scene description.In this paper,the double attention mechanism is introduced to further optimize the model.In the coding stage,the visual features of image and the text features of text description are extracted respectively by the Inception-v4 network and the double-layer LSTM network.In the decoding stage,the Note the text information of image and text in specific area,fuse the important information of two attention mechanisms,and finally output the Chinese description statement of image through multi-layer perceptron network.The model can capture more important information from the image,so as to improve the fluency and richness of Chinese description sentences.By comparing the convergence degree of neural network model,attention model and dual attention model,the convergence value of the confusion degree of this model is 5% and 6% lower than that of neural network and attention model respectively,which shows that the description of this model generation is more in line with the Chinese language environment.By comparing the evaluation indexes of the three models,the double attention model is 10.7%,6.2% and 1.8% higher than that of the single-layer visual attention model in BLEU-4,ROUGE-L and CIDEr indexes,which shows that the double attention model generates more description sentences.At the end of this paper,compared with B-NIC model and F-NIC model in this field,the range-l index,which reflects the correlation between image and text,is increased by 8.7% and 3.9% respectively.It further shows that the Chinese description generated by double attention method is more natural and more in line with the content of image.
Keywords/Search Tags:computer vision, natural language processing, dual attention mechanism, Chinese description of image
PDF Full Text Request
Related items