Font Size: a A A

Research On Cross-Modal Retrieval Based Latent Semantic Space Learning

Posted on:2020-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z L ZhuFull Text:PDF
GTID:2428330590995462Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,multimedia data is exploding,and single-modal data retrieval technology can no longer meet people's need.Cross-modal retrieval technology has emerged from time to time,and become the mainstream of information retrieval field,with theoretical and practical significance.The basic content of cross-modal retrieval is to establish the pair-wise relationship between cross-modal data,and to use a certain modal data to retrieve other modal data with approximate semantics.The main research is how to model image-text pairs,thus transforming cross-modal data into latent semantic space.The semantic gap between cross-modal data is reduced,so you can search for relevant text data based on the input image data or relevant image data based on the input text data.In order to realize mutual retrieval between large-scale and high-dimension multimedia data,three cross-modal retrieval techniques are proposed based on latent semantic space learning in this paper:1.An All Similarity Preserving Cross-Modal Hashing(ASPCH)method is proposed.The method maps the image and the text to the latent semantic space,simultaneously uses the supervised nature of the label to constrain the inter-modal and the intra-modal semantic representation to improve the retrieval precision.Among them,the retention of similarity in modality adopts the algorithm of nearest neighbor,which considers the connection of data in the local geometry of the data.It cosideres that same object with different modal have the same semantic lable as the constraint,which enhances the correlation between semantic representations.2.A Supervised Discriminative Hashing Cross-Modal Hashing(SDCH)method is proposed.The algorithm transforms the learning of semantic representation into a categorizable problem.While retaining the consistency of semantic representation between modals,the constraint semantic representation is linearly separable in the latent semantic space.It can make the semantic representation more discriminative and improve cross-modal retrieval accuracy.3.A deep semantic matching(DSM)algorithm is proposed.The method is to finetune AlexNet image deep neural network and the text deep neural network,then extract top-level feature representations of images and text respectively.The final category probability contribution vector is represented as a latent semantic feature of the image and text,which can achiev direct semantic matching of the image and the text in the semantic space.Deep semantic matching uses the deep learning method to top-level the underlying features,and realizes the latent semantic correlation between the modal data at the feature level.It also realize mutual retriveal of images and texts.In this paper,the cross-retrieval experiments are carried out on the Wiki single-label dataset and the NUS-WIDE multi-label dataset.The results show that the proposed methods have certain advantages compared with state-of-the-art methods.It proves that the proposed method has certain validity and feasibility.
Keywords/Search Tags:Latent semantic space, k-nearest neigbor, Discriminative power, Hashing, Semantic Matching, Neural Networks
PDF Full Text Request
Related items