Font Size: a A A

Cross-modal Retrieval Research Based On Correlation Analysis And Structure Preserving

Posted on:2021-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:M J ZhangFull Text:PDF
GTID:1368330602466036Subject:Management of engineering and industrial engineering
Abstract/Summary:PDF Full Text Request
With the development of internet technology and the popularization of intelligent devices,there has a massive growth of the multi-modal data,and cross modal retrieval has attracted more and more attention.Cross modal retrieval refers to using a query data in one modality to retrieve semantically relevant data in another modality,such as using an image to retrieve the semantically relevant texts and using a text to retrieve the semantically relevant images.Due to the different characteristic dimensions of different modalities data,similarity measurement cannot be carried out directly,so there exists a heterogeneous gap.The data in any modality also has a semantic gap between its low-level feature representation and high-level semantics.In order to overcome the two difficult problems of heterogeneous gap and semantic gap,many cross modal retrieval methods have emerged in recent years,such as subspace learning based methods,hash learning based methods,dictionary learning based methods and deep learning based methods.The essence of these methods is to learn a common space and preserve the correlation between different modalities data and the structure of any modality data as much as possible after projection.In this paper,cross modal retrieval based on correlation analysis and structure preserving is studied by using the methods of subspace learning based and deep hashing based,and the main contributions of the works are as follows:1.We propose a multi-modal graph regularization based Class Center Discriminant analysis for Cross modal Retrieval(CCDCR)method.In order to ease the problems of heterogeneous gap and semantic gap,the method not only carries out content correlation analysis and semantic correlation analysis on training samples,but also carries out content correlation analysis and semantic correlation analysis on class center samples.In order to ensure the structure information of multi-modal data unchanged after projecting into a common subspace,so as to further improve the retrieval accuracy,this model not only constructs inter-modal similarity graph with all paired images and texts,but also constructs intra-modal similarity graph and inter-modal similarity graph with class center samples.The inter-modal similarity graph which is constructed by all paired images and texts ensures the semantic relation unchanged after projection.The intra-modal similarity graph and the inter-modal similarity graph which are constructed with class center samples ensure the local structural information and the global structural information of data,thereby strengthening the discriminant ability of the model.2.Aiming at the defect that CCDCR method only considers the neighbor relationship of samples,and does not consider the neighbor relationship of intra-class samples and the neighbor relationship of inter-class samples respectively,we propose a Supervised Graph Regularization based Cross media Retrieval with intra and inter-class correlation(SGRCR)approach.The core idea of the approach is as follows: the approach not only guarantees that paired images and texts are as close as possible after projecting(content correlation analysis),but also ensures that paired images and texts are close to their real semantics as much as possible after projecting(semantic correlation analysis).Then the graph regularization item is considered to minimize the intra-class samples' distance and maximize the inter-class samples' distance,thereby easing the heterogeneous gap and the semantic gap problems.3.Because the deep model can well extract features of muti-modal data,mining the correlation of heterogeneous data and handling large-scale datasets,we propose a Deep Semantic cross modal hashing with Correlation Alignment(DSCA)method.In DSCA,we design two networks for image and text modality separately,and learn two hash functions.Firstly,we construct a new similarity for the multi-label dataset,which can well exploit the semantic information and improve the retrieval accuracy.Simultaneously,we preserve the inter-modal similarity of heterogeneous data features,which can well exploit semantic correlation.Secondly,the distributions of heterogeneous data are aligned so as to mine the heterogeneous correlation well.Thirdly,the semantic label information is embedded in the hash layer of the text network,which can make the learned hash matrix more stable and make the hash codes more discriminative.4.Aiming at the defect that DSCA method will lose information by using the same measurement metric when processing data of different modalities,we propose a Deep semantic hashing with Modal-Specific Similarity Preserving for cross modal retrieval(DMSSP)method.In DMSSP,we first build the inter-modal similarity with a weighted distance between cosine distance and Euclidean distance,then,we build the text intra-modal similarity with cosine distance function and the image intra-modal similarity with Euclidean distance function,which can improve the retrieval accuracy by paying attention to the specifics of each modality.Therefore,we not only consider the inter-modal similarity but also consider the intra-modal similarity,which can ease the heterogeneous correlation problem.Moreover,the semantic information embedding,quantization loss and bit balanced constraint are considered in this model.Finally,experiments on two public datasets show the superiority of our proposed DMSSP method,as compared with eight state-of-the-art methods.
Keywords/Search Tags:Cross modal retrieval, Subspace learning, Supervised graph regularization, Intra-modal similarity, Inter-modal similarity, Hashing learning, Convolutional neural network, Correlation alignment, Semantic embedding
PDF Full Text Request
Related items