A Cross-Modal Multimedia Retrieval Method Research Based On Deep Learning And Centered Correlation

Posted on:2017-01-11

Degree:Master

Type:Thesis

Country:China

Candidate:H Zou

Full Text:PDF

GTID:2308330509459647

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

An increasing number of different multimedia information, including text, voices, videos and images, are used to describe the same semantic concept together on the Internet. This paper presents a more efficient method to cross-modal multimedia retrieval. This paper takes images and texts as research object and recommends feature extraction, deep learning, shared presentation space and the method of similar degree algorithm between cross-media information. Then put forward a deep learning and shared representation space learning based approach to cross-modal multimedia retrieval. At last, we adopt centered correlation to measure the distance between images and texts to improve the average retrieval accuracy. The work of this paper can be divided into three parts:(1) We analysis the differences between shallow learning and deep learning and the advantage of deep learning; we analysis representation methods of features of text, including word vector, the bag of word model and Latent. Dirichlet Allocation(LDA) model, etc; we summarize the methods of feature extraction of image, including color histogram, bag of visual word, texture feature, SIFT and deep learning features, etc. According to the expressed content of features, we analyzes the applicable scenario of these methods and the advantages of deep features in processing the huge amounts of high-dimensional data.(2) Put forward to use the thought of deep learning in cross-media retrieval: We use the adjusted CNNs to learn the deep features of images, use the LDA model to get the text features, and then design a deep learning and shared representation space learning based cross-modal multimedia retrieval framework. Mapping the two feature spaces into a shared presentation space by a probability model in order that they are isomorphic. We adopt correlation between these two modal features in the shared representation space and implement the cross-media retrieval.(3) According to the characteristics of the cross-media data, we adopt centered correlation to measure the distance between them to further improve the average accuracy of cross-media retrieval results. Because of the natural form and the representation method of images and texts are completely different, we should consider the dimensional difference problem when we do similarity measure in the shared presentation space. Moreover, the similarity of these two classes of feature vectors are related to their direction and that this article adopts the method of decentration to eliminate the influence of dimensional difference before the calculation of the correlation between the image and text( namely centered correlation). Centered correlation not any considers the dimensional difference problem but also considers the direction of the two different classes of feature vectors.

Keywords/Search Tags:

cross-media retrieval, deep learning, CNNs, shared presentation space, centered correlation

PDF Full Text Request

Related items

1	Correlation Mining Based Cross-media Retrieval
2	The Research On Cross-Media Retrieval Based On Deep Correlation Mining
3	Research On Multi-Scale Fusion Cross Modal Retrieval Based On Deep Learning
4	Research On Theories And Methods Of Shared Subspace Representation Learning For Multi-view Data
5	Research On Deep Compatibility Learning For Cross Audio-Visual Media Matching
6	Research On Novel Retrieval Techniques For Fashion Media Data
7	Research On Cross-media Semantic Mining Based On Deep Canonical Correlation Analysis
8	Cross-modal Retrieval Based On Deep Model Learning
9	The Cross Media Retrieval Based On Correlation Analysis
10	Research On Text And Image Cross-media Retrieval Based On Decision Tree Hash