Font Size: a A A

Audio Processing In Content-based Video Retrieval

Posted on:2005-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z FengFull Text:PDF
GTID:1118360125967349Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Advances in the multimedia and internet bring more and more digital videos intohuman's daily life. It presents new challenges for managing and searching large videocollections. Therefore, content-based video processing and retrieval have been a focusof research in multimedia application, information retrieval and data management. Video is a media which integrates visual and audio information. More and moreresearchers have found it is difficult to acquire content only from visual information.Meanwhile, more contents information lies in audio and can be acquired more easily.That's the reason why audio processing is important to content-based videoprocessing and retrieval. Considering the characteristics of video, this paper discusses how to process,analysis and apply audio information in video applications based on traditional audioprocessing approaches. 3 audio processing techniques are mainly described: audiotype classification, speaker information analysis and special audio event detection. Classifying the audio into different types is a fundamental step when applyingaudio information in video. We proposed a new audio type classification algorithmbased on Maximum Entropy Model. It can select effective features automatically andcan get better performance than other algorithms, such as k-NN, GMM and SVM, incomplicated audio environments. Human, esp. speaker is a general object in video processing and retrieval. Aspeaker information analysis framework is described in this paper. In this framework,speaker segmentation and clustering algorithm based on approximate KL distancebetween GMMs is proposed. Special audio events are always related to special events in video. We proposedtwo algorithms to detect cheering and whistle in sport videos respectively. Meanwhile,a video association mining algorithm is used to fuse the content features acquiredfrom audio and video. This algorithm can be used to detect the sport video events andbuild video index of events. Except 3 techniques above, we also describe other audio processing algorithmsadopted in Fudan's Video System which participated in TRECVID Evaluations.
Keywords/Search Tags:audio processing, audio type classification, speaker segmentation, speaker clustering, audio event detection, content-based video retrieval, video processing
PDF Full Text Request
Related items