Font Size: a A A

Research On New Methods Of Cross-Modal Retrieval Via Hash Learning

Posted on:2022-09-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:D L ZhangFull Text:PDF
GTID:1488306725451554Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the fast development of multimedia technology,smart device and social media,large volume of multimedia data are poured into the Internet.The huge multimedia data contain rich social information and have important economic value,which provides new opportunities for social progress and economic development.Facing the rapid growth of multimedia data,how to efficiently search,store and utilize these data is still a challenging problem.Hashing-based retrieval methods emerge to solve the problem.Given the merits of low storage cost and high computational efficiency,it has received increasing attention.Most of the traditional hashing approaches are mainly used for single-modal retrieval.However,with the development of multimedia technology,many applications involve data of more than one modality(e.g.,texts and images),and these multimodal data show the properties of diverse structures,high-dimensional and semantic intersection.Therefore,how to design effective cross-modal retrieval algorithms to facilitate interaction with data that still need to be further studied.Besides,the existing cross-modal hashing methods remain some problems that need to be further explored.For this purpose,this thesis fully considers the characteristics of multimodal data as well as some deficiencies in existing methods,and systematically studies the cross-modal hashing retrieval methods.The main contributions of this thesis are summarized as follows,(1)A two-stage cross-modal hashing method based on label relaxation is proposed.Most existing cross-modal hashing methods usually simultaneously learn the hash functions and hash codes in a unified framework,namely one-stage model.However,it makes the optimization task very difficult and limits the flexibility of hash function.Besides,most existing supervised approaches utilize the strict binary label matrix(i.e.,0 and 1),which has tiny gaps between 0 and 1,increasing the risk of classification error.Moreover,to solve the discrete constraint problem,many methods take the relaxation scheme,causing large quantization loss.To address the above issues,a two-stage hashing method based on smooth matrix factorization and label relaxation is proposed.The proposed method controls the margins adaptively by the novel label relaxation strategy,which can reduce the quantization loss.Besides,the developed method is a two-stage model.In stage 1,the hash codes can be learned discretely by using the discrete smooth matrix factorization model,reducing the quantization loss caused by relaxation.In stage 2,a semantic embedding hash function learning strategy is proposed,which can learn more effective hash functions.The experimental results demonstrate that the proposed method can achieve good performance.(2)A robust and discrete matrix factorization hashing method for cross-modal retrieval is presented.Most existing cross-modal hashing methods usually adopt l2-norm in the objective function,which may amplify the error,making these methods sensitive to noises and outliers.Besides,many supervised methods utilize the large affinity matrix(whose size is the square of the number of samples)to preserve the similarity,leading to high space and time complexity.To solve these problems,a cross-modal hash method based on robust and discrete matrix factorization is proposed,which takes two-stage strategy.In stage 1,the discrete matrix factorization scheme and l2,1-norm are introduced to make the model not sensitive to noises.Meanwhile,the hash codes can be obtained discretely without using the intermediate variables,avoiding unnecessary quantization loss.Besides,a scheme of directly correlating the hash codes and label matrix is proposed,which avoids manipulating the large similarity matrix and reduces the complexity.In stage 2,a semantic autoencoder scheme is proposed to learn the hash codes,making the hash function more powerful.Comprehensive experiments demonstrate that our method can achieve better search performance compared with some existing methods.(3)An unequal length discrete asymmetric cross-modal hashing method is devised.Existing cross-modal hashing methods usually utilize the equal-length encoding scheme to represent the multimedia data.However,the equal length encoding scheme may not fully characterize the multimedia data because of the diversification of multimedia data structures.Besides,there exist other challenges in cross-modal retrieval systems,e.g.,how to solve the discrete constraints,how to effectively exploit the discriminative label information without using the large similarity matrix.To solve the above problems,this thesis proposes a discrete asymmetric cross-modal method,which can not only deal with the traditional equal length encoding scenarios,but also can deal with the novel unequal length encoding scenarios.Besides,a scheme of minimizing the distance-distance difference is developed to construct a supervised semantic embedding framework,which reduces the complexity greatly.Meanwhile,an asymmetric strategy is employed to establish the connection between hash codes and the latent subspace.In the second stage of the proposed method,a semantic intersection scheme is proposed to learn the hash functions,resulting in more powerful hash functions.Extensive experiments show that our method can achieve good performance in equal length encoding scenarios,the proposed method also achieves effective performance in unequal length encoding scenarios,improving the flexibility of cross-modal retrieval in real applications.(4)A multi-hash codes joint learning method for cross-modal hashing is developed.For existing cross-modal hashing methods,a fixed hash length(e.g.,16bits or 32bits)is predefined before learning the hash codes.However,these methods need to be retrained when the hash length changes,which consumes additional computational power and reduces the scalability in real applications.Besides,existing methods only explore the information in the original multimedia data to perform the hash learning,without considering the rich information contained in the learned hash codes.To solve the above problems,a multi-hash codes joint learning method for cross-modal retrieval is proposed,which can simultaneously learn the hash codes with multiple lengths in a unified framework.Besides,to enhance the discrimination,the proposed method combines the clues from the longer hash representations,multimodal data and semantic labels for hash learning.The proposed method is the first work to simultaneously learn different length hash codes without retraining.Experiments on several datasets demonstrate that the performance of the developed method is superior to some competitive methods.(5)A semantic autoencoder method for cross-modal hashing is designed.Most existing cross-modal approaches utilize the batch-based mode to update the hash functions,without the ability to efficiently handle the online streaming multimedia data.Online hashing methods can well address the above problems by using the online learning strategy to incrementally update the hash functions.Nevertheless,the existing online cross-modal hashing methods still suffer from several challenges,e.g.,how to construct the relationships between newly arriving data and the existing data,how to fully and effectively utilize the supervised information,how to learn powerful hash functions.To conquer the above challenges,a semantic autoencoder cross-modal hashing method is proposed.Specifically,the proposed method leverages the semantic autoencoder scheme to establish the relationships between hash codes and labels,meanwhile,the label inner product scheme is utilized to build the connection between newly coming data and existing data,making the optimization less sensitive to the newly arriving data.Besides,in the stage of hash function learning,more effective hash functions can be learned by the developed semantic scheme.Comprehensive experiments on several datasets show that the proposed method can achieve good search performance.
Keywords/Search Tags:multimedia data, hash learning, cross-modal retrieval, discrete optimization, online hashing
PDF Full Text Request
Related items