Audio Processing In Content-based Video Retrieval

Posted on:2005-08-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Feng

Full Text:PDF

GTID:1118360125967349

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Advances in the multimedia and internet bring more and more digital videos intohuman's daily life. It presents new challenges for managing and searching large videocollections. Therefore, content-based video processing and retrieval have been a focusof research in multimedia application, information retrieval and data management. Video is a media which integrates visual and audio information. More and moreresearchers have found it is difficult to acquire content only from visual information.Meanwhile, more contents information lies in audio and can be acquired more easily.That's the reason why audio processing is important to content-based videoprocessing and retrieval. Considering the characteristics of video, this paper discusses how to process,analysis and apply audio information in video applications based on traditional audioprocessing approaches. 3 audio processing techniques are mainly described: audiotype classification, speaker information analysis and special audio event detection. Classifying the audio into different types is a fundamental step when applyingaudio information in video. We proposed a new audio type classification algorithmbased on Maximum Entropy Model. It can select effective features automatically andcan get better performance than other algorithms, such as k-NN, GMM and SVM, incomplicated audio environments. Human, esp. speaker is a general object in video processing and retrieval. Aspeaker information analysis framework is described in this paper. In this framework,speaker segmentation and clustering algorithm based on approximate KL distancebetween GMMs is proposed. Special audio events are always related to special events in video. We proposedtwo algorithms to detect cheering and whistle in sport videos respectively. Meanwhile,a video association mining algorithm is used to fuse the content features acquiredfrom audio and video. This algorithm can be used to detect the sport video events andbuild video index of events. Except 3 techniques above, we also describe other audio processing algorithmsadopted in Fudan's Video System which participated in TRECVID Evaluations.

Keywords/Search Tags:

audio processing, audio type classification, speaker segmentation, speaker clustering, audio event detection, content-based video retrieval, video processing

PDF Full Text Request

Related items

1	Research On Acoustic Feature Analysis In Audio Retrieval
2	A Speaker Identification System For Video Content Analysis
3	Automatic Segmentation And Clustering Of Multi-genre Audio Method Research And Implementation
4	Wireless Intercom Audio Of Speaker Segmentation And Clustering Research
5	Research On Audio-Visual Media Content Processing And Analyzing For Scene Understanding
6	Study On Content-based Audio Retrieval Technology
7	An Audio Classification Algorithm For News Video Retrieval
8	Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Stream
9	Design And Implementation Of Audio And Video Processing System Based On DM642
10	Study On Key Techniques Of Content-Based Audio Retrieval (CBAR)