Font Size: a A A

A Research On Speaker Recognition Algorithm And Speaker Identification System Implementation

Posted on:2011-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:S Q YangFull Text:PDF
GTID:2178360305977863Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Speaker recognition is the most natural biometric identification technique. It can be divided into two sub-fields: speaker identification and speaker verification. Speaker recognition automatically identifies the speaker according to the characteristics embodied in the speaker's speech signals, the key issues are the choice of characteristic parameters and recognition modeling. At present, linear predictive coding (LPC) parameters, LPC cepstrum (LPCC) and mel-frequency cepstral coefficients (MFCC) and so on are often used as feature parameters in speaker recognition, and the popular recognition model are dynamic time warping (DTW), vector quantization (VQ) and hidden Markov model (HMM) etc.LPCC represents the physiological differences of speaker's vocal tract, and MFCC utilizes the non-linear frequency characteristics of auditory system, the speech perception characteristics of human auditory. Hilbert-Huang Transform (HHT) is proposed at 1998, due to its strong adaptive time-variance processing capability for non-steady and non-linear signals, HHT rapidly receives wide attentions and get many successful applications in signal processing field. HHT is also the newly measure of speech signal processing. Each of above speech or speaker features: LPCC, MFCC or HHT, has its own advantage, though solely being applied is far to enough to describe the speaker's discriminative characteristics. Each of these features may contain semantic information and also speaker characteristics, integral utilization of theses diverse features may be the best way to construct a reliable speaker recognition system.Basing upon above analysis, in the experimentation of speaker recognition, LPCC, MFCC and HHT are respectively supplied to the speaker recognition system, and then the combined features of MFCC and HHT are used. In experiments of this thesis, Matlab is the development environment. LPCC, MFCC, HHT, and combined features are extracted or formulated from speech signal, then supplied to several popular models: Dynamic Time Warping (DTW), Hidden Markov Model (HMM), Gaussian Mixture Model (GMM). In addition, the Gaussian components in GMM are also tested for comparing recognition performance.The results showed that, for speaker identification, HHT features have better recognition rate than LPCC and MFCC, and with combined features, GMM is favorable to DTW or DHMM, and combined features is superior to any non-combined features: LPCC, MFCC or HHT feature. It has been shown that HHT feature can be used as new parameters in speaker recognition, if it is combined with MFCC feature to formulate combined ones; the combined features may simultaneously contain MFCC dynamic time characteristics and HHT high frequency resolution capability. The combined features may improve the system performance. GMM may be the best recognition model in speaker identification system.
Keywords/Search Tags:Speaker Identification, Hidden Markov Model (HMM), Hilbert-Huang Transform (HHT), Mel-Frequency Ceptral Coefficients (MFCC)
PDF Full Text Request
Related items