Font Size: a A A

Study On Techniques Of Multi-Modality Media Information Retrieval

Posted on:2014-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LuFull Text:PDF
GTID:1228330467479933Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the multimedia technology and Internet, heterogeneous multimedia data available increase dramatically over the Internet. Compared with traditional retrieval methods of single modality media data, the analysis and processing of multi-modality media data not only better represent the user’s retrieval intention, but also play an important role for the understanding of semantic of multi-modality media data. However, the multi-modality media data cannot be efficiently processed by using traditional multimedia retrieval methods, because of the complexity and the heterogeneity among different low-level features. Consequently, how to effectively manage and retrieve of multi-modality media data now becomes a hot topic in the field of multimedia retrieval.According to the characteristics of semantic correlation among multi-modality media data, this dissertation does a thorough research on multi-modality media information retrieval by exploring the technology of machine learning and multi-modality information fusion. The main research works include:For semantic concept dectection of video shots, we propose an extreme learning machine (ELM) based multi-modality classifier combination framework to improve the accuracy of semantic concept detection. Firstly, three ELM classifiers are trained by exploring three kinds of visual features respectively. Then, a probability-based fusion method is proposed to combine the prediction results of each ELM classifier. Further, we integrate the prediction results of ELM classifier with the information of contextual correlation among concepts to improve the accuracy of semantic concept detection. Experiments on the widely used TRECVID dataset demonstrate that the proposed method can effectively impove the accuracy of semantic concept detection and achieve performance at extremely high speed.For uncertain semantic representations of videos, we propose a novel multi-information fusion approach (MIF) based on a two-phase framework that involves the inferring phase and the fusing phase. In the inferring phase, the most relevant concepts to the user’s query are chosen by exploring both contextual correlation among concepts and the temporal correlation among video shots. In the fusing phase, the inferred probabilities of the related concepts are fused together with the detection results via minimization of potential function to refine the detector prediction. Extensive experiments demonstrate that the proposed method can effectively solve the problem of uncertain representations of videos, and improve the accuracy of semantic video retrieval.For large scale cross-media retrieval, firstly, a multi-modality semantic relationship graph (MSRG) is constructed using the semantic correlation amongst the multi-modality media objects. Secondly, all the media objects in MSRG are mapped onto an isomorphic semantic space. Further, an efficient indexing MK-tree based on heterogeneous data distribution is proposed to manage the media objects within the semantic space and impove the performance of cross-media retrieval. Extensive experiments on real large scale cross-media dataset indicate that the proposed approach dramatically impoves the accuracy and efficiency of cross-media retrieval, outperforming the existing methods significantly.For social image retrieval, we propose a novel framework based on social relationship graph for social image search (SRGSIS), which involves two stages. In the first stage, heterogeneous data from multi-modality information source are used to build a social relationship graph. Then, for the given query keywords, we execute the efficient keyword search algorithm over the social relationship graph and obtain top-k candidate results based on relevance score. In the second stage, each image in social relationship graph is represented as a region adjacency graph and further models these region adjacency graphs as a closure tree and compute approximate graph similarity between the candidate results and the closure tree to obtin more desirable results. Extensive experimental results demonstrate the effectiveness and accuracy of the proposed approach.
Keywords/Search Tags:semantic concept detection, multi-modality, cross-media retieval, socialimage retieval, multi-modality semantic relationship graph, social relationship grpah
PDF Full Text Request
Related items