Font Size: a A A

A Cross-Modal Multimedia Retrieval Method Research Based On Deep Learning And Centered Correlation

Posted on:2017-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:H ZouFull Text:PDF
GTID:2308330509459647Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
An increasing number of different multimedia information, including text, voices, videos and images, are used to describe the same semantic concept together on the Internet. This paper presents a more efficient method to cross-modal multimedia retrieval. This paper takes images and texts as research object and recommends feature extraction, deep learning, shared presentation space and the method of similar degree algorithm between cross-media information. Then put forward a deep learning and shared representation space learning based approach to cross-modal multimedia retrieval. At last, we adopt centered correlation to measure the distance between images and texts to improve the average retrieval accuracy. The work of this paper can be divided into three parts:(1) We analysis the differences between shallow learning and deep learning and the advantage of deep learning; we analysis representation methods of features of text, including word vector, the bag of word model and Latent. Dirichlet Allocation(LDA) model, etc; we summarize the methods of feature extraction of image, including color histogram, bag of visual word, texture feature, SIFT and deep learning features, etc. According to the expressed content of features, we analyzes the applicable scenario of these methods and the advantages of deep features in processing the huge amounts of high-dimensional data.(2) Put forward to use the thought of deep learning in cross-media retrieval: We use the adjusted CNNs to learn the deep features of images, use the LDA model to get the text features, and then design a deep learning and shared representation space learning based cross-modal multimedia retrieval framework. Mapping the two feature spaces into a shared presentation space by a probability model in order that they are isomorphic. We adopt correlation between these two modal features in the shared representation space and implement the cross-media retrieval.(3) According to the characteristics of the cross-media data, we adopt centered correlation to measure the distance between them to further improve the average accuracy of cross-media retrieval results. Because of the natural form and the representation method of images and texts are completely different, we should consider the dimensional difference problem when we do similarity measure in the shared presentation space. Moreover, the similarity of these two classes of feature vectors are related to their direction and that this article adopts the method of decentration to eliminate the influence of dimensional difference before the calculation of the correlation between the image and text( namely centered correlation). Centered correlation not any considers the dimensional difference problem but also considers the direction of the two different classes of feature vectors.
Keywords/Search Tags:cross-media retrieval, deep learning, CNNs, shared presentation space, centered correlation
PDF Full Text Request
Related items