Font Size: a A A

Research On Text-Independent Speaker Recognition

Posted on:2009-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:H X GaoFull Text:PDF
GTID:2178360245979950Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Speaker Recognition is the process of automatically recognizing a person from their speech. Because of the bright future for application, it has become the focus of research in authentication identification and artificial intelligence fields. From the 1930s to date, many excellent speaker recognition models have been built.Speaker recognition consists of two steps: feature extraction and pattern matching. Based on them, some improved character parameters and matching methods are developed as the effective algorithm to increase the performance of speaker recognition system, especially recognition rate and stability. The main works are done as the following:Feature Extraction: Some appropriate parameters for speaker recognition are analyzed and compared including Linear Prediction Cepstrum Coefficients(LPCC), Mel Frequency Cepstrum Coefficients(MFCC) and the acoustic dynamic feature etc.Pattern Matching Method: Vector Quantization(VQ) and Gaussian Mixture Model(GMM) are studied and improved which have proven to be the effective pattern matching methods used in speaker recognition.Based on the above research, three different speaker recognition algorithms are proposed as follows:(1) A Speaker Recognition algorithm based on MFCC + centroid and VQAccording to many experiments on Vector Quantization including the selection of the speech features, the size of the codebook and the quantization distortion and so on, a Speaker Recognition algorithm based on MFCC + centroid and VQ is proposed. 12 MFCCs(C0 is absent) and centroid of each frame are extracted as audio feature parameters of speakers in the algorithm. This method is simple and effective. But it has the low recognition rate in some cases of short time speech.(2) A Speaker Recognition algorithm based on MFCC+ΔMFCC and GMMConsidering the disadvantage presented in (1), Gaussian Mixture Model is used to get a better algorithm: A Speaker Recognition algorithm based on MFCC+ΔMFCC and GMM. In it speech signals are characterized by a 24-dimention feature vector consisting of the components C2-C13 of MFCCs and their differential coefficientsΔMFCC. Many experiments show that higher recognition rate can be reached using this method especially in short time speech. But the disadvantages are the low recognition speed and unstable recognition result. (3) A Stable and Effective Speaker Recognition Algorithm based on VQ-GMMIn order to solving the problem of unstable recognition result using the Speaker Recognition algorithm based on MFCC+ΔMFCC and GMM talked in (2), a Stable and Effective Speaker Recognition Algorithm based on VQ-GMM is proposed combining Vector Quantization and Gaussian Mixture Model in which the initial parameters are gotten by the Vector Quantization method. The same feature vectors of MFCC+ΔMFCC are also used in this algorithm. Test results show that the approach gives more reliable and reasonable performance compared with the traditional system when the speech signal is short.The above algorithms have worked out satisfactorily in the speaker identification experiments based on our voice database of 50 speakers. So they can be used in different occasions.The current research on speaker recognition is not focused on one kind of application but on universal application. So the future work should mainly pay attention to finding the robust and real-time speech features from speech signals and the more effective methods. In addition, speaker recognition should be studied from more practical aspects according to meeting the market requirements.
Keywords/Search Tags:Speaker Recognition, feature extraction, pattern matching, Vector Quantization, Gaussian Mixture Model
PDF Full Text Request
Related items