Font Size: a A A

Effective temporal video segmentation and content-based audio-visual video clustering

Posted on:2004-06-05Degree:Ph.DType:Dissertation
University:Georgia Institute of TechnologyCandidate:Kang, Jung WonFull Text:PDF
GTID:1468390011476941Subject:Engineering
Abstract/Summary:
There is a need for tools that efficiently index, browse, and retrieve video data to efficiently access extremely diverse video data without exhaustive searching. To achieve this goal, the first step is temporal video segmentation and the second step is clustering the segmented video sequence according to its content. For temporal video segmentation, a novel spatial-domain approach to detect shot changes and sub-shot changes is proposed. The proposed spatial-domain method for shot change detection provides high performance in the presence of fast camera/object movement or sudden variations in the luminance with a new pixel-wise difference measurement and an inconsistency measurement of the motion vectors. The proposed spatial-domain method for sub-shot change accurately and efficiently estimates camera movement by using information from extracted background images. To reduce computation complexity, a compressed-domain approach is proposed by modifying the proposed spatial-domain approach.; For video clustering, audio-visual clustering methods are proposed to classify video sequences into three categories using both audio and visual information. These categories are action scenes, dialogue scenes, and miscellaneous scenes, which are all high-level semantic entities. First, to cluster a video sequence into action and non-action scenes, motion activity and average shot length are used for the visual classification, and the average energy of the audio sequence is used for the audio classification. Then, to cluster non-action scenes into dialogue and non-dialogue scenes, the time-constrained video clustering method proposed by Yeung and Yeo is modified and applied to the visual information, and a speaker identification and tracking (SDT) method is applied to the audio information. To improve the performance of clustering and the SDT system, a face recognition method is combined with both the modified time-constrained video clustering method and the SDT method. As a result, the proposed video clustering method can also identify the actors and actresses in dialogue scenes by applying SDT.
Keywords/Search Tags:Video, SDT, Proposed, Scenes, Audio, Visual
Related items