Font Size: a A A

Gaussian Mixture Model-based Speaker Recognition Study

Posted on:2009-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:W JiangFull Text:PDF
GTID:2208360245961229Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition is the processing of automatically recognizing which is speaking by using speaker specific information included in speech signal speaker recognition. In general, it can be classified into text-independent speaker recognition and text-dependent speaker recognition according to recognition condition. This thesis focuses on research of text-independent speaker recognition technology based on Gaussian mixture models (GMM).Firstly, this thesis introduces vocal tract model from acoustic theory of speech production; hereby, introduces all-pole model of speech signal; studies voice production from sonant, unvoiced fricatives, unvoiced plosives using all-pole model of speech signal.Secondly, we study feature extraction, introduce linear prediction algorithm which is used in computing the parameters of all-pole model of speech signal and gives several prediction-derived parameter include reflection coefficient, linear prediction cepstrum coefficient and log area rations coefficient; Also we introduce Mel-Warped cepstrum coefficient and sub-cepstrum coefficient. We study the performance of these features on speakers recognition, the cepstrum coefficient can represent the feature of speaker more accurately, so the cepstrum coefficient has better performance on speaker recognition.In succession, we study the theory of GMM. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity, these spectral shapes represent a speech classes, for example, phonemes. This thesis discusses estimation of GMM's parameters, initialization and classification decision. This thesis also makes some improvement on GMM, including making covariance diagonal matrix which can improve computational performance, variance limiting which can avoid model singularities and avoid decrease of recognition performance.We design and implement an automatic speaker recognition system for a complete experimental evaluation of GMM.Finally, a complete experimental evaluation of GMM is conducted with 36 speaker database. The experiments examine model initialization, model order selection and large population performance. Some observations and conclusions are: Identification performance of GMM is insensitive to the method of model initialization; There appears to be a minimum model order needed to adequately model speakers and achieve good identification performance (Sixteen for this 12 speaker database); The GMM maintains high identification performance with increasing population size if the training speech data and test speech data is enough (length of training speech data is large than 90 seconds, length of test speech data is large than 5 seconds).
Keywords/Search Tags:speaker recognition, speech model, LPC, FCC, GMM
PDF Full Text Request
Related items