Font Size: a A A

Research On Statistics-Based Analysis And Extraction For Video Semantic

Posted on:2007-02-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:W WeiFull Text:PDF
GTID:1118360185991693Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Video content analysis is an important research issue in multimedia information processing. An increasing number of video and multimedia applications rely on the possibility of interactively manipulating, distributing, indexing and accessing based on its content. However, most content analysis techniques are low-lever feature based (color, texture, shape, etc.), which are abstract and quite different from the semantic concepts in human thought. Thus, it is emergent to achieve content representation at high level concepts. To bridge the gap between low-level features and high semantics, extracting high level semantic concept is becoming one of the most challenging research contents in multimedia domain at present.Based on statistics theory, a generic framework for multimedia content semantic analysis is proposed in this dissertation. Multilayer semantic analysis and multimodal information fusion are unified in the same modal. Firstly, a method for gradual transition detection based on statistical distribution is used to segment shot, which uses scalable color descriptor as basic feature for the dissimilarity measure and embeds the information of the intensity variance along the corresponding series of frames. Secondly, frame-segment key-frame strategy and attention selection model are used to concisely represent video content. With pattern classification technique, the basic visual semantics are recognized. Then, a multilayer structure modal is used to extract multi-level visual semantics. After that, an audio semantic analysis scheme is presented with the spectrum feature extracted by Fourier transform algorithm. Finally, a bionic multimodal fusion method with two level structures for video semantic concept analysis is proposed.The main contributions of this dissertation are summarized as follows:(1) A generic framework for understanding and extracting visual semantic in different semantic granularities is proposed. Spatio-temporal selective attention modal, a neuromorphic model that simulates human visual system, is used to select dynamic and static salient areas. The areas is classified into basic visual semantics, using a feature selection algorithm based on approximated Bayesian error (ABFSA) to select the most important features from high dimension feature set. A fixed-length combination partition method (FLCPM) is presented for the purpose of improving recognition precision for basic visual semantic with multi-normal distribution attributes. Looking high-level visual semantics as hidden state, HHMM incorporated temporal semantic context constraints by...
Keywords/Search Tags:Video Semantic Analysis, Feature Extraction, Video Semantic Concept, Semantic Gap, Multimodal Fusion, Bayesian Classification, HMMs, HHMM, MPEG-7
PDF Full Text Request
Related items