Font Size: a A A

A Speaker Identification System For Video Content Analysis

Posted on:2009-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:J BiFull Text:PDF
GTID:2178360245969837Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Advances in multimedia, Internet and mass storage bring more and more digital videos into human's daily life. It presents new challenges for massive video data sharing, video non-linear editing, high-level semantic analysis and video retrieval, which further the research of content-based video parsing.Due to the development limitation of machine vision and pattern recognition, more and more researchers have found that it is still difficult to automatically extract high-level semantic structure for general video stream only based on visual information. Audio, as another time-dependent media in video documents, can supplement visual information and supply a unique cue for video content analysis. Meanwhile, more contents information lies in audio and can be acquired more easily. Therefore, more literatures proposed to apply audio content analysis techniques in content-based video parsing recently.This paper presents our current works on a speaker identification system for video content analysis, which is different from normal speaker identification system in the following aspects: firstly, soundtrack extracted from video stream includes not only silence and speech, but also music and environmental sound; secondly, the number of speakers in video content is uncertain and the pure training data of each speaker is unavailable; thirdly, the presence of noise in the video can significantly deteriorate system performance.According to such considerations, our speaker identification system architecture consists of such basic parts: audio classification and segmentation using rule and Support Vector Machine based classifier; speech clustering using spectral clustering techniques and speaker identification based on Gaussian Mixture Model; speech enhancement based on spectral subtraction.The research work and results of this dissertation can be concluded as follows:(1) In this dissertation, a research is made into the principles and algorithms of speaker identification. Speaker identification based on Gaussian Mixture Model is accomplished and the feasibility of this method is validated.(2) SVM is a theory that is based on Vapnik Chervonenkis Dimension, generalization performance, extensive ability. Support vector and kernel function are introduced in chapter 3. Three-SVM audio classification framework is proposed and implemented, the feasibility of which is validated by obtained results.(3) The principles and algorithms of speech enhancement are studied. The implementation of spectral subtraction confirms the validity of this method in the speaker identification system for video content analysis.Experiments have been carried on datasets extracted from news videos, conversation videos and movie videos. The obtained results confirm the validity of the proposed system architecture.
Keywords/Search Tags:video content analysis, audio classification and segmentation, speech clustering, speaker identification, speech enhancement
PDF Full Text Request
Related items