Font Size: a A A

Research On The Semantic-based Video Analysis And Classification

Posted on:2010-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:P JiangFull Text:PDF
GTID:1118330338995719Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The ever-increasing amount of multimedia information is becoming inaccessible because of the lack of human resources to perform the time-consuming task of annotating it. The major goal of multimedia research is directed toward providing information for pervasive access and use. To achieve this, it is critical to develop technologies to find the points of interest from the media chunks. However, current user expectations still far exceed the intelligence of today's computing systems, despite the significant progress in automated feature-based and structure-based indexing and retrieval techniques. The solutions currently available have one major drawback: generic low-level content metadata available from automated processing deals only with representing perceived content, but not its semantics. Thus, more and more research effort is now geared toward modeling and extracting media-intrinsic, as well as media-extrinsic, semantics.The semantic gap is the key issue to block the communication of human and computer system. As human trend to search the video by the semantic content while computer system treat the video data as low level feature such as color, texture and sharp etc. How to bridge the semantic gap is a very active research area in the content-based video retrieve and index.In this dissertation, we produce the research on the key technologies of content based video analysis and classification. Firstly we study the three layers content description model and then we enhance the three layers content model to a four layers content model by introducing a visual perceptive layer. The four layers are: basic visual content, visual perceptive content, object content and the scene content. Further more, we address the key issues in each layer and some new algorithms have been proposed to resolve these issues. The innovations of our works are:We analysis current content description model, and proposed an unified framework for sematic content analysis. This framework consists of four stages: low-level feature extraction, mid-level representation, object production, and scene analysis. The main contributions of this dissertation are summarized as follows:(1) In the visual perceptive content extraction layers, we propose a novel visual attention regions detection method in dynamic scene which employs the spatiotemporal model- SMGDS. With the properties of Human Visual System(HVS), the motion attention is computed from the homography of feature points trajectory. The static attention map is generated using center-surround descriptor. The spatial and motion attention are combined to form an overall attention map in a motion priority fashion. The perceptive contents are derived from the attention map. To speed up the attention map generation, we propose a novel fast visual attention map detection approach using spatiotemporal model- SMGTSM. To generate motion saliency map, we extract the feature point motion vectors as motion feature, and then propose a new fuzzy cluster method which use cluster validation to analysis motion consistency. The spatial saliency map is generated using Gabor filter and center-surround descriptor. At last, we fuse the motion and spatial saliency map in motion priority fashion to produce the overall spatiotemporal attention map.(2) For the object content extraction, we focus on the surveillance video. To enable foreground object extraction in outdoor scene, a robust background subtraction technique named AFSDS is proposed based on the adaptive clustering of temporal color/intensity. An un-supervised clustering method is proposed to model the background with a group of clusters and their weights. The clusters and their weights can be updated with the background change. In addition, the unimodal or multimodal distributions of background are detected adaptively. We also present a novel statistical threshold estimation scheme to determine the thresholds using in our method. To enalble the real-time foreground detection, we also proposed a real-time foreground detection algorithm called FFSBC. The unimodal or multimodal distributions of background are detected adaptively. We use a Gaussians model to simulate each cluster which prevents the estimation the parameter of mix of Gaussians model. These nice features enable the real-time foreground detection in outdoor scene.(3) For the scene content extraction, we investigate several methods to semantic content analysis according to the characters of each type of scene. Person retrieval and indexing in video sequences is a challenging task for many multimedia applications. In order to index the person shown in a video, we propose a novel person indexing method called APIV. Firstly, the persons in a shot are detected and tracked through face detector and continuously adaptive mean shift algorithm. Then mid-level features such as clothes colors and voice are applied to represent the person. An unsupervised cluster method is performed to cluster the person for further index. At last, the cluster is validated and refined by the use of voice feature. The person shown in the video can be indexed automatic with the APIV method.(4) The video abstraction is a critical method to semantic content analysis. The key-frame based video summarization is the most popular method to abstract a video clip. We propose a novel keyframe-based video summarization approach using visual attention clue. To bridge the semantic gap between low-level descriptors used by computer systems and the high-level concepts perceived by human users, a new visual attention index(VAI) descriptor based on visual attention model is proposed. Both spatial and temporal saliency maps are constructed and further fused in a dynamic fashion to produce the overall spatiotemporal attention index. We use VAI to estimate the attentions that viewers may pay to video contents. The frames with higher VAI are selected as key-frames candidates. In particular, an adaptive video key-frame extraction technique is demonstrated to simulate focus points change of the viewers. The key-frames density in shots is controlled by the attention area shift.
Keywords/Search Tags:semantic, content analysis, video summarization, object extraction, video retrieve
PDF Full Text Request
Related items