Font Size: a A A

Research On Image Caption Algorithm Based On Deep Learning

Posted on:2019-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y JiangFull Text:PDF
GTID:2428330548494972Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of computer vision and multimedia technology,more and more people choose to use images to communicate or express information,and it is more vivid and impressive to convey information in the form of image than the pure text description.With the tens of thousands image information,how to classify and retrieval the images efficiently and effectively become an urgent need.As the key technology of Content-Based Image Retrieval,automatic image annotation technology reduces manual intervention and labor costs and provides great convenience for image retrieval and management.With the development of deep learning,people begin to use convolutional neural network to extract image features,and take advantage of recurrent neural network to do natural language processing,and combine convolutional neural network with recurrent neural network to annotate the image.However the method still can't achieve an ideal effect and exist a problem of “semantic gap”.According to the existing problems of image annotation algorithm,and considering the reasons of why the humans can describe the images accurately and vividly,an image caption algorithm with corpus is proposed based on the Stanford University's Neuraltalk.The algorithm consists of four parts: the first part is the image and semantic alignment model,the second part is the word vector training model,the third part is the corpus information fusion model,the fourth part is the automatic generation of Corpus-MRNN image description model.The algorithm uses word vector model to train corpus into distributed representation,and uses the corpus information fusion model to extend the training set by extracting the high semantic similarity words from corpus with the key words of training set.Then the algorithm can achieve the purpose of enriching the training set by using human language knowledge.So it can improve the accuracy of image caption and reduce the semantic gap.At the same time,in order to maximize the retention of semantic information of word embedding during the training process,the Sequence vector training model that contains word order information based on word2 vec is proposed.By changing the form of input data,the model can maximize the retention of word order information of the corpus and improve the performance of word embedding.Finally,in order to verify the validity of the proposed algorithm,the thesis compares the experiment of image caption with word embedding that training by CBOW model to the experiment of Neuraltalk,and to the experiment of image caption with word embedding that training by Sequence model based on BLEU and METEOR as the evaluation mark.The experimental results proved that the image caption algorithm with corpus can improve the accuracy of description and reduce the semantic gap in a certain extent,and the algorithm is feasible and effective.
Keywords/Search Tags:image caption, corpus, word embedding, deep learning
PDF Full Text Request
Related items