Font Size: a A A

Cross-modal Information Retrieval Based On Convolutional Neural Network

Posted on:2019-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:J F WangFull Text:PDF
GTID:2438330563457669Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the continuous development of the business of electronic computers,the Internet and mobile communications,the new generation of industrial structure with the theme of "Internet +" is growing.In the era of the Internet and Mobile Intelligence,people are free to play the role of publisher and receiver of information anytime,anywhere via the Internet.The information that exists in the Internet is not only based on the text as the carrier,but also on the combination of texts,images and videos as the carrier,which we call multi-modal information.And with the continuous popularization of the Internet,the multi-modal information on the Internet also shows an exponential trend of growth.Traditional retrieval methods such as semantic retrieval and semantic retrieval can not meet the needs of modern information retrieval to a certain extent.Therefore,it is necessary to establish and perfect the theories,methods and techniques for cross-modal retrieval.Cross-modal information retrieval also came into being,and research on cross-modal information retrieval also has very important research significance and application value.After learning for the first time in 2006,a large number of scholars have applied it to the field of image recognition and speech recognition and made extraordinary breakthroughs.Compared with the traditional machine learning algorithms,a large number of research results prove that deep learning theory has achieved outstanding performance in all fields.In this paper,the convolution neural network in depth learning is applied to Chinese information retrieval,and the convolution neural network model is introduced into the cross-modal information retrieval.Therefore,the main work of this paper is as follows:To achieve cross-search between the two most common media content,image and text,we must first represent the image and the text separately by some feature vector,that is,map the image data to the image feature space and the text data to the text feature space.However,there is no direct relation between the two feature spaces,and the CCA algorithm can map the two eigenvector spaces to two linearly related eigenvector spaces through training of many "image-sample" pairs.The two linearly related spaces Is linearly related and can directly measure the similarity between image and text feature vectors,which provides a theoretical basis for image-text cross-retrieval.In this paper,the convolution neural network is applied to the image feature extraction,and the document theme model LDA is used in the text feature extraction,and the feature vector space extracted by the two is used as the input of CCA,so that the data set can be trained to cross the graphic information Retrieve.In this paper,two datasets of Wikipedia Chinese dataset(CH-Wikipedia)and Sogou-Internet Photo Database 2.0(SogouP2.0)are used as experimental data,and two cross-modal retrieval tasks are tested during the experiment The correctness and validity of the cross-modal information retrieval method:(1)Retrieve text by image;(2)Retrieve image by text.Experimental results show that the cross-modal information retrieval method proposed in this paper achieves the expected results and improves the retrieval accuracy to a certain extent.
Keywords/Search Tags:Convolutional Neural Network, Cross-modal Information Retrieval, Canonical Correlation Analysis
PDF Full Text Request
Related items