Font Size: a A A

Research On Video Retrieval With Structral Data

Posted on:2009-09-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z W GuFull Text:PDF
GTID:1118360242495816Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years, the amounts of video data have surged to an unprecedented level, videos play more and more important role in our daily life, and internet video sharing will still be remarkable in the next several years (even decades). As a result, video content analysis and video retrieval are becoming central issues in video research. Content-based video retrieval (CBVR) is a theoretical, practical and challenging technique, it has made tremendous progress in the past several years, and some prototype systems have been developed for small commercial search engines. Generalized video structuring plays a key role in CBVR, however, the raw video data is unstructured, in the first step, it needs to be organized as structural data using appropriate models, and then perform video analysis, indexing and querying on the basis of the organized structure. The objective of this thesis is to research on the structural characteristics in video content, and further design efficient machine learning algorithms for high-level semantic understanding by using such structural characteristics. These machine learning algorithms attempt to narrow the "semantic gap" between low-level feature and high-level semantic automatically or with few manual laboring, and ultimately improve the retrieval performance.In this thesis, we take the hierarchical structure as the clue to analysis the semantics in video content. We propose appropriate algorithms with the hierarchical structure, i.e. image-level, shot-level and scene-level structures. The main contribution are summarized as follows,1. For image-level retrieval based on global information, we propose to process multiple sampling by integrating AdaBoost and SVM, and select a few helpful features taking classification accuracy as criterion of feature, meanwhile boost the weak classifiers to a strong classifier.2. For image-level retrieval based on regional information, we model the image structure with multiple-instance learning which belongs to structural learning framework, and introduce multiple instance active learning (MIAL) to reduce manual labeling and solve the problem of lacking labeled-samples. We analysis the characteristics of MIAL, and categorize it into three paradigms, i.e. bag-level, instance-level and mixture-level active learrling. For bag-level MIAL, we propose a sample selection strategy which takes the statistics of instance number as an important measure, and combines with the uncertainty of samples. The experimental results demonstrate the effectiveness of the proposed algorithm.3. As shot is the basic physical unit of video, video retrieval is usually adopted at shot-level. We study the intrinsic hierarchical structure information of the video content, and propose the multi-layer multi-instance (MLMI) learning framework, which is the combination of structural learning and multiple instance learning, has the ability of modeling the video content in natural sense. We discuss the problems should be solved in multi-layer multi-instance learning, and designed a complete framework composed of several algorithms for these problems. Firstly, a MLMI kernel is constructed to measure the similarity of the special structure. To weight the instance contributions, we further utilize marginalize method and propose the marginalized MLMI kernel. To deal with the ambiguity propagation problem which is introduced by weak labeling and multi-layer structure, we then propose a regularization framework which takes several explicit constraints into consideration, i.e. hyper-bag prediction error, sub-layer prediction error, inter-layer inconsistency measure, and classifier complexity, and the MLMI learning problem is finally solved preferably.4. Scene is regarded as the basic semantic unit in video, it is more abstract and recapitulative than shot, thus employing the scene information in semantic understanding could be beneficial for the semantic level applications, such as video retrieval, management, etc. We propose an energy minimization based scene segmentation (EMS) algorithm in which not only the global distribution of time and content, but also the local temporal continuity are taken into account simultaneously. Moreover, a scheme of fusing scene segmentation and automatic speech recognition (ASR) results is proposed and adopted in video retrieval.
Keywords/Search Tags:Content-based video retrieval, semantic analysis, adaboost-SVM, multi-instance active learning, multi-layer multi-instance learning, kernel method, scene segmentation
PDF Full Text Request
Related items