Font Size: a A A

Image Chinese Caption Generation Based On Visual Attention And Topic Model

Posted on:2020-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2428330572978181Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The automatic generation method of image caption implement the cross-modal conversion from image to natural language,which involves the computer vision and the natural language processing,and is one of the most sophisticated research topics in the field of artificial intelligence.Currently,the neural network NIC model has achieved good results in the task of automatically generating the image caption.However,there are still some problems,such as the deviations between the description sentences generated by the model and the content of image expression,low accuracy of image scene description,monotony of the sentence and so on.In addition,the existing datasets and methods in the task of automatically generating the image caption are mostly based on English description,and it is indispensable to design an image caption system that is more in line with Chinese language environment and pragmatics.In view of the above problems,this paper proposes an image Chinese caption generation method based on visual attention and topic model.The specific work is as follows.An image caption generation model based on visual attention is proposed.Although the coding and decoding model based on convolutional neural network and recurrent neural network has become the mainstream method to solve the problem of image caption generation,the oversimplified structure of the model leads to a certain deviation between the description sentence generated by the NIC model and the image expression content.Aiming at this problem,this paper proposes an improvement based on the NIC model of image caption generation.The Inception_v3 network is used as the image encoder,and two-layer LSTM with visual attention mechanism is introduced as the sentence decoder.Experiments show that the visual attention-based image caption model outperforms the NIC model on the AIC-ICC Chinese image caption dataset.Furthermore,by introducing the topic information to further optimize the model,an image caption generation model based on the topic model is proposed.For the NIC model and the image caption generation model with the visual attention proposed in thispaper,there are still some problems that the generated image scene description are low accuracy and monotonous.In this paper,the NMF models is introduced to extract the topic information hidden in the image,and the topic information is used to guide the caption generation in the decoding process,thereby alleviating this problem.The representation of topic information can be divided into two types: topic probability vector and topic word vector.Finally,the experimental results show that the method of image Chinese caption based on topic model is better than existing model in each evaluation,and the method of image Chinese caption based on topic probability vector is superior to that based on topic word vector,which has been greatly improved especially in terms of vocabulary richness.The concrete examples in the experiment show that the model proposed in this paper is effective and can automatically generate Chinese descriptive sentences more natural words and more abundant sentence patterns.
Keywords/Search Tags:Image Caption, Convolutional Neural Network, Recurrent Neural Network, Attention Mechanism, NMF Topic Model
PDF Full Text Request
Related items