Font Size: a A A

Automatic segmentation, indexing and retrieval of audiovisual data based on combined audio and visual content analysis

Posted on:2000-05-12Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Zhang, TongFull Text:PDF
GTID:2468390014461948Subject:Engineering
Abstract/Summary:
A system was proposed in this thesis for automatic segmentation, indexing and retrieval of audiovisual data based on multimodal media content analysis. The purpose was to generate meta-data for video sequences for information filtering and retrieving. The audiovisual stream was demultiplexed into different media types such as audio, image and caption. An index table was generated for each video clip by combining results from content analysis of these diverse media types. Structures for different video types were described, and models were built for each video type individually. This general modeling and structuring of video content parsing is very unique. It achieves more functions than existing approaches which normally adopt a single model with focus on the pictorial information alone.; For content-based management of audiovisual data, a hierarchical system consisting of three stages was developed. In the first stage, the task of on-line segmentation and classification of accompanying audio signals into twelve basic types of sound was accomplished. The boundaries were precisely set, and an accurate classification rate higher than 90% was achieved. This procedure is generic and model free. In the second stage, fine-level classification of environmental sounds by using the hidden Markov model was performed. Experimental results showed that an accuracy rate of 86% was obtained. Finally, based on the classification approach, a query-by-example retrieval scheme for sound effects was proposed and proved to be very effective.; For content analysis of image sequences, an efficient and robust method was developed for shot change detection. This new method was derived from the twin-comparison algorithm with a new ingredient, i.e. the histogram difference of the Y- and V-components was incorporated. It was shown that the proposed method achieved both the sensitivity rate and the recall rate at around 95% with various kinds of test video. A scheme was also proposed for adaptive keyframe extraction based on histogram comparison. It was demonstrated by experiments that it could generate keyframes that properly represent the content of a shot.
Keywords/Search Tags:Audiovisual data, Content, Segmentation, Retrieval, Proposed, Rate
Related items