Font Size: a A A

Research On Cross-modal Multimedia Retrieval Method Based On Neural Network

Posted on:2019-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2438330548454999Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cross-modal multimedia retrieval is becoming a more and more valuable research issue in the domain of information retrieval.With the coming era of big data,multimodal data grow rapidly.The retrieval of single-modal data can't meet the needs of human in many domains.Cross-modal multimedia retrieval has been proposed.The main idea of cross-modal multimedia retrieval is establishing the correlations among different modal data.In this paper,we focus on the research issue of cross-modal multimedia retrieval between images and texts,which includes two retrieval paradigms: using query images to retrieve semantically relevant texts,and using query texts to retrieve semantically relevant images.In this paper,we employ the model of sparse neural network pre-trained by deep Restricted Boltzmann Machines to discuss the application of neural network in the domain of cross-modal multimedia retrieval.The methods are shown as follows:1.A method of cross-media retrieval named cross-media semantic matching is proposed.We employ two independent models of deep neural networks to map the low-level features of images and texts to their semantic subspace.Specifically,we use the low-level features of training images and texts with their labels to train two independent models of deep neural networks.Then we input the low-level features of testing images and texts to the trained models of neural networks,and regard the top-level outputs of networks as the semantic subspace of testing images and texts.The method attempts to use absolutely semantic information for cross-media retrieval.It doesn't need clear interpretation of low-level features(e.g.line,edge or word,sentence).Furthermore,it considers the semantic information of isomorphic media data as well as semantic consistency of heterogeneous media data.2.A method of cross-media retrieval named modality-reconstructed cross-media retrieval is proposed.We employ a model of a deep neural network to map the low-level features of images to the space of textual features.Specifically,we use the low-level features of training images and texts to train a model of a deep neural network.Then we input the low-level features of testing images to the trained model of a neural network,and regard the top-level outputs of network as the space of testing textual features.The method uses a model of deep neural network to map the low-level features of images to the space of textual features directly and it can omit the isomorphic subspace between images and texts.The method is unsupervised and doesn't need any labeled sample.3.A method of cross-media retrieval named cross-media retrieval with collective deep semantic learning is proposed.We employ two independent models of deep neural networks to map the low-level features of images and texts to their semantic subspace.Furthermore,we use collective deep semantic learning to exploit the potential semantic information of unlabeled data.Specifically,two complementary models of neural networks are first trained to map the low-level features of images and texts to their semantic subspace.Based on it,weak semantic labels of both unlabeled images and texts are generated accordingly.They are exploited together with the pre-labelled training samples to retrain the retrieval model.The method starts from the global structure information of multimedia data.It can exploit potential semantic information of unlabeled data and enhance weak semantic labels to strong semantic labels with collective deep semantic learning.Furthermore,the method can enhance discriminative capability and semantic modeling capability of the retrieval model.It can identify a more semantically meaningful subspace for achieving the cross-media retrieval.
Keywords/Search Tags:Cross-Modal Multimedia Retrieval, Deep Neural Network, Semantic Matching, Modality-Reconstructed Cross-media Retrieval, Collective Deep Semantic Learning
PDF Full Text Request
Related items