Font Size: a A A

Cross-modal Multimedia Information Retrieval

Posted on:2016-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:L X ShiFull Text:PDF
GTID:2308330461968118Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Content-based multimedia retrieval has become a hot spot in the field of multimedia information retrieval since the early 1990s. And it is also a very attractive research direction in computer vision. Generally, the technologies of statistical analysis, pattern recognition, machine learning and human-computer interaction are integrated in the content-based multimedia retrieval. The main purpose is to remedy the limitations of the traditional approaches which is only based on the text, including laborious and time-consuming manual annotation, difference of subjective manual selection, etc. Besides, traditional retrieval methods can only deal with the unitary type of media such as the image, video or audio, which are not able to retrieve the objects across different types of media. With the development of technologies, people urgently need a new retrieval method for different mode of multimedia data. This paper is to research the cross-modal retrieval which can deal with and query different forms of multimedia data flexibility.Most existing retrieval methods for images and videos are based on searching relevant text. For example, Google returns the images according to a set of keywords which are mainly derived from the text associated to the image or the manual annotation of image. However due to the difference of cultural background and professional knowledge between annotators, sometimes the textual information seems to be confusing and unreliable. It is difficult to find an effective and accurate description to character the information of images and videos. Hence traditional retrieval methods are hard to meet the demand with relatively low precision.This paper studies the relevant technologies of multimedia information retrieval firstly and summarizes four typical approaches for cross-modal retrieval, i.e., linear iteration and mapping, the nonlinear manifold, the probability model and the analysis of heterogeneous. Then this paper proposes two novel methods for cross modal information retrieval and both methods can generalize the patterns of different multimedia data. By utilizing CCA, the latent correlation between different multimedia data is learned and modeled so as to achieve a better performance. The first retrieval method is based on doc2vec and ITQ cross modal of multimedia information retrieval; The second approach is based on the model of LDA and ITQ cross modal of multimedia information retrieval; And the third method is based on fusion more characteristics in the cross modal information retrieval method, and in the third method, we have put forward two different fusion methods. The purpose of these three methods is aimed at in a different way to bridge the different modal (image, text, video, audio) of multimedia information.The effectiveness of these approaches is evaluated by the cross-modal multimedia retrieval task, i.e., text retrieval through the image and image retrieval through the text. Two corpora are used in the experiments, i.e., the English Wikipedia data (EG-wikipedia) and the Chinese Wikipedia data (CH-wikipedia). Empirical results demonstrate that the proposed methods can achieve better performance.
Keywords/Search Tags:Cross-modal, Cross-modal retrieval, image retrieval, Canonical Correlation Analysis
PDF Full Text Request
Related items