Font Size: a A A

Cross-modal Event Retrieval Based On Deep Semantic Learning

Posted on:2020-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:T R W SiFull Text:PDF
GTID:2428330596495457Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Recently,cross-modal retrieval has drawn much attention due to the rapid growth of multimodal data such as text,image,audio and video on the Internet.It is common that different types of data are used for describing the same events or topics.Cross-modal retrieval is defined as,it takes one type of data as the query to retrieve another type of data which belongs to the same label.For example,a user can use a piece of news to retrieve relevant pictures or videos.There have been many applications for multi-modal retrieval,including hot topic detection,personalized recommendation and search engine e.t.c.With the rapid growth of multimodal data,it becomes difficult for users to search information of interest efficiently.Various methods have been proposed to deal with such a problem.Till now,there have been various research techniques for indexing and searching multimedia data.However,these search techniques are mostly single-modality-based,which can be divided into keywordbased retrieval and content-based retrieval.They only perform similarity search of the same media type.Since the query and its retrieved results can be of different modalities,how to measure the content similarity between different modalities of data remains a challenge.The main contributions of this work are the following:1.We propose a new research topic called cross-modal event retrieval by combining crossmodal retrieval in the field of multimodal with event detection in the field of social media.Deep Semantic Space(DSS)is proposed for cross-modal retrieval,which is achieved by exploiting deep learning models to extract semantic features from images and textual articles jointly.Thus,different modalities of data are converted to a isomorphic semantic space.More specifically,a VGG network is used to transfer deep semantic knowledge from a large-scale image dataset to the target image dataset.In the mean time,the domain discrepancy problem is solved by minimizing the Maximum Mean Discrepancy between source domain data and target domain data in the same modality.Therefore,the transfer model can match the distribution of data in target domain better.Simultaneously,LSTM is utilized to model semantic representation from textual features for cross-model event retrieval.Finally,the interactive deep semantic space model is trained by minimizing the semantic regularization loss.In deep semantic space,different modalities of data are converted to isomorphic vectors so that the distance between data samples can be measured by Euclidean distance and Cosine similarity straightforwardly.What's more,the cosine similarity between relevant image-text pairs is maximized,while the cosine similarity between irrelevant image-text pairs is minimized.2.We collect a dataset called Wiki-Flickr event dataset for cross-modal event retrieval,where the data are weakly aligned unlike image-text pairs in the existing cross-modal retrieval datasets.Furthermore,we build a cross-modal event retrieval system based on Wiki-Flickr event dataset.3.Extensive experiments conducted on both the Pascal Sentence dataset and our WikiFlickr event dataset show that our DSS outperforms some of the state-of-the-art approaches.
Keywords/Search Tags:Cross-modal retrieval, Event detection, Deep learning, Semantic space
PDF Full Text Request
Related items