Font Size: a A A

Study Of Cross-modal Hashing Algorithms With Applications

Posted on:2021-03-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:J YuFull Text:PDF
GTID:1368330647961794Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technologies,the world has entered the era of multimedia big data.The information is usually associated with multi-modal forms such as image,text,video,and audio.Multiple modalities depict the same object in different forms and express the same semantic information.The rich complementary information in cross-modal data has great economic value and brings opportunities for social devel-opment.Since cross-modal data are in heterogeneous high-dimensional spaces and there are correlations between different modalities,it is very important to effectively learn the potential low-dimensional shared space between cross-modal data.This dissertation takes cross-modal data as the research object and combines the hashing technology to design cross-modal hashing methods for cross-modal retrieval,image retrieval,text retrieval and multimedia retrieval.The main research contents of this dissertation are summarized as follows:Firstly,a cross-modal supervised hashing model based on matrix factorization is pro-posed.In order to improve the discriminative ability of the model,we use the known label information to learn the attributes of the specific class,so that the learned hash features retain the attribute information of the corresponding categories.The non-linear kernel mapping maintains the intra-modal similarity and can capture the non-linear structure information of the samples.The proposed model jointly performs classifier learning,sub-space learning and label consistency matrix decomposition to learn the discriminative unified hash features.Secondly,a cross-modal supervised hashing model based on multi-view features is presented.In view of the limited representation ability of a single view feature,which leads to insufficient information for downstream tasks and limits the improvement of model's performance.In order to better learn compact hash codes,the proposed model uses multiple view features to represent cross-modal data,which enriches the description information.A large number of experimental results on various search tasks show that multi-view features can greatly improve retrieval performanceThirdly,an online cross-modal fused hashing model based on Hadamard matrix is de-veloped.It is difficult to set the optimal hyper-parameters for the traditional cross-modal fused hashing models with many hyper-parameters.To solve this problem,we propose a hashing method based on the Hadamard matrix.This method is simple yet effective,involving few hyper-parameters,and preserves the discriminative semantic information in the process of learning hash.In the online search process,cross-modal dynamic infor-mation is adaptively captured.The experimental results show that this method achieves superior accuracy and efficiency in multimedia retrieval and can be applied very flexibly since it is not sensitive to hyper-parameters.Fourthly,a cross-modal semi-paired hashing model based on label propagation is proposed.For most of existing cross-modal methods,it is assumed that cross-modal data is well aligned.However,fully-aligned data is not universal in reality.Meanwhile,taking the limited labeled data into account,we propose a semi-paired and semi-supervised hashing method.This method constructs the cross-modal similarity based on anchor samples to generate pseudo-labels for unlabeled data and combines feature learning and classifier learning to learn hash.The experimental results under the settings of semi-paired and fully-paired verify the effectiveness of this method in cross-modal retrieval tasks.Fifthly,a cross-modal unsupervised hashing model based on multi-modal graph em-bedding is designed.Most of the existing unsupervised cross-modal hashing methods do not consider feature learning and the geometric structure preservation mechanism simul-taneously in the process of learning hash.To tackle this problem,the proposed method embeds the local linear neighborhood graph constructed in the visual space and the se-mantic relation graph constructed in the text space into the hash codes directly.At the same time,the l2,1 norm constraint term is employed to learn compact hashing features.The experimental results on the standard datasets show that the model combining the graph embedding and the feature learning achieves significant improvement in terms of retrieval performance.To sum up,under the settings of multiple cross-modal data scenarios,this disser-tation proposes five different cross-modal hashing methods which make full use of the complementarity,semantic association and geometric structure characteristics between cross-modal data to improve the accuracy and efficiency in cross-modal search,image search,text search and multimedia search applications.A large number of experimental results show that the proposed methods achieve better performance than existing related methods.
Keywords/Search Tags:cross-modal learning, hashing learning, graph embedding, online hashing, semi-paired hashing
PDF Full Text Request
Related items