Effective temporal video segmentation and content-based audio-visual video clustering

Posted on:2004-06-05

Degree:Ph.D

Type:Dissertation

University:Georgia Institute of Technology

Candidate:Kang, Jung Won

Full Text:PDF

GTID:1468390011476941

Subject:Engineering

Abstract/Summary:

There is a need for tools that efficiently index, browse, and retrieve video data to efficiently access extremely diverse video data without exhaustive searching. To achieve this goal, the first step is temporal video segmentation and the second step is clustering the segmented video sequence according to its content. For temporal video segmentation, a novel spatial-domain approach to detect shot changes and sub-shot changes is proposed. The proposed spatial-domain method for shot change detection provides high performance in the presence of fast camera/object movement or sudden variations in the luminance with a new pixel-wise difference measurement and an inconsistency measurement of the motion vectors. The proposed spatial-domain method for sub-shot change accurately and efficiently estimates camera movement by using information from extracted background images. To reduce computation complexity, a compressed-domain approach is proposed by modifying the proposed spatial-domain approach.; For video clustering, audio-visual clustering methods are proposed to classify video sequences into three categories using both audio and visual information. These categories are action scenes, dialogue scenes, and miscellaneous scenes, which are all high-level semantic entities. First, to cluster a video sequence into action and non-action scenes, motion activity and average shot length are used for the visual classification, and the average energy of the audio sequence is used for the audio classification. Then, to cluster non-action scenes into dialogue and non-dialogue scenes, the time-constrained video clustering method proposed by Yeung and Yeo is modified and applied to the visual information, and a speaker identification and tracking (SDT) method is applied to the audio information. To improve the performance of clustering and the SDT system, a face recognition method is combined with both the modified time-constrained video clustering method and the SDT method. As a result, the proposed video clustering method can also identify the actors and actresses in dialogue scenes by applying SDT.

Keywords/Search Tags:

Video, SDT, Proposed, Scenes, Audio, Visual

Related items

1	Research On Automatic Video Classification Algorithm Based On Audio-visual Features And Svm
2	Design And Implementation Of The Instant Messenger System Based On P2P Structure
3	Design And Implementation Of The Instant Messenger System Based On P2p Structure
4	Audio-visual scene analysis with application in sports video
5	Multiplexing H.264 video with AAC audio bit streams, demultiplexing and achieving lip synchronization during playback
6	Hierarchical Segmentation of Videos into Shots and Scenes using Visual Content
7	Research And Implementation Of Abnormal Audio Monitoring System For Adaptive Scenes
8	Design And Implementation Of Video Conferencing Systems
9	Speech Endpoint Detection Based On Audio And Visual Features
10	Context visuals in L2 listening tests: The effectiveness of photographs and video vs. audio-only format