Font Size: a A A

Researches On Cross-Modal Learning Algorithms For Image-Text Retrieval

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:E YuFull Text:PDF
GTID:2428330602964576Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years,multimedia data has exploded and appeared on the Internet in different forms.Correlation analysis and processing of multimedia data have become a significant research topic.Among them,cross-modal retrieval has attracted widespread attention in both industry and academia areas.Different from the traditional single-modal information retrieval technology,cross-modal retrieval usually uses query data in one modalities to retrieve semantically relevant instances from other modalities.However,different modalities usually exist in different feature spaces and it is difficult to correlate the low-level features and high-level semantics.Thus,this leads to the basic challenges in cross-modal retrieval: how to align the low-level "heterogeneous representation" and bridge high-level "semantic gap".Therefore,this paper focuses on the basic problems in cross-modal retrieval and related researches on the automatic annotation,the construction of deficient data and the improvement of retrieval efficiency.The main works and contributions can be summarized as follows:1.To solve the problems of heterogeneous representation and semantic gap,this paper proposes the multi-class joint subspace learning for cross-modal retrieval.Because most existing crossmodal retrieval algorithms ignore the discrepancy of semantic distribution among different categories.Thus,this algorithm proposes to learn specific projection matrices for specific tasks and categories,which distinguishes the semantic distribution among multi-classes.And,it can also fully explore the shared information of the semantic overlap via the proposed joint learning strategy.In the retrieval stage,a pre-trained linear classifier is used to adaptively correlate the optimal projection matrix with specific query samples,so that the heterogeneous data can be mapped to the latent semantic subspace for similarity measuring and return the final retrieval results.2.Aiming at solving the problem of data deficiency and data annotation,this paper proposes a semi-supervised cross-modal retrieval algorithm based on pseudo data generation.This algorithm proposes a reconstruction method of pseudo data based on the clustering center,which contributes to constructing training data.In addition,a mapping table between cluster centers and unlabeled samples is established as a basic criterion for label prediction.Finally,it integrates the semantic analysis,correlation analysis and feature selection into a joint crossmodal retrieval framework for subspace learning.3.In order to further improve the learning efficiency and accuracy of semi-supervised algorithms,this paper proposes an adaptive semi-supervised feature selection for cross-modal retrieval.Since the traditional semi-supervised algorithms always ignore the dynamic optimization between label prediction and subspace learning,which reduces the performance of semisupervised learning for cross-modal retrieval.Therefore,this algorithm proposes to define specific local graph models associated with semantic for different tasks to fuse the two processes of label prediction and subspace learning.It not only keeps the semantic and structural attributions of the raw features,but also ensures the accuracy of label prediction on specific tasks.And therefore the performance of semi-supervised cross-modal retrieval algorithms is improved greatly.4.In order to solve the problem of retrieval speed and storage efficiency in the big data era,this paper proposes the cross-modal transfer hashing based on coherent projection.The algorithm first embeds heterogeneous information by a linear mapping,and proposes a concept of coherent projection to ensure the inheritance of heterogeneous information from the raw feature space to the hash space.In addition,the anchor graph model with linear complexity is introduced to further exploit the structural relationship of raw features so that the learned unified hash codes have more abundant intrinsic distribution of raw features.
Keywords/Search Tags:Cross-Modal Retrieval, Semi-Supervised Learning, Hashing, Feature Selection, Graph Model
PDF Full Text Request
Related items