Font Size: a A A

Complex Channel Speaker Recognition

Posted on:2008-11-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:W GuoFull Text:PDF
GTID:1118360242964757Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
This thesis focuses on the session variability text-independent speaker recognition. In this thesis we explore the possibilities to obtain the high identification rate while maintaining the little computation load. They are explored in: 1) front feature extraction; 2) test algorithm of GMM_UBM; 3) SVM based system;4) session variability reduction. Novel and efficient algorithms are proposed. Including:The fixed frame length acoustic feature is always adopted in speaker recognition. The voiced and unvoiced sound is treated equally. But the unvoiced sound is more like a white noise signal, the voiced sound reflects the movement of the vocal tract and is a periodic signal, so more speaker information is contained in the voiced sound. The variable frame length feature extraction procedure is proposed. Furthermore, the feature of the voiced sound has more weight than the unvoiced sound in model training procedure. The EER of the dynamic feature can be reduced by 10% against the traditional fixed frame length feature.The GMM_UBM is the state-of-the-art system in speaker recognition. The log-likelihood test algorithm is adopted in the test procedure. In this thesis, the angle of the model divergence is introduced to replace the log-likelihood in the test procedure. The angle of the model divergence can acquire almost the same result as the log-likelihood. Further more, the scores of two system can be fused, the EER of the fused system can decrease by 12%-15% against either system.In recent years the SVM has made encouraging progress in speaker recognition. Three methods are discussed to improve the SVM based system. 1) The optimized GMM mean supervector and weight supervector are suggested as the input of the SVM. The suggested supervector can outperform the traditional GMM mean supervector by 20% in EER. 2) The model divergence and the angle of the model divergence are also prososed as the input of the SVM, and they can be combined with the GLDS to improve performance. 3) Due to the insufficiency of data from target speakers, the SVM algorithm frequently encounters the class imbalance problem which may introduce severe performance degradation. Two strategies are proposed to select impostor's samples in SVM training. The first adopts the model divergence and the second uses the SVM to select the proper impostor speaker. Furthermore, the speech of target speakers can be cut into two or more parts to increase the target samples.The channel or session variability problem is the most important factor detoriating speaker recognition. The original speech signal will get some bias because of the variability of the communication channel and the handset. This bias will detoriate the speaker recognition system. Three methods have been proposed to estimate the channel space in the GMM supervector, they are: EM, PCA and NAP algorithms. The feature mapping is applied to the feature after the estimation of channel space. The EER of the feature mapping system can decrease by 22% at most against the baseline system.Factor analysis is one of the most effective methods to remove the channel bias. The MAP and factor analysis are integrated in this thesis to reduce the complexity of factor analysis. The feature mapping is applied after the estimation of the channel space. So factor analysis is utilized as a front end processing algorithm, the framework of the GMM_UBM can treated as before. The EER of the factor analysis system can decrease by 40% without any additional computation. On the other hand, the factor analysis, can be combined with SVM, which can outperform the NAP based SVM system.
Keywords/Search Tags:speaker recognition, Gaussian mixture model, supervector, factor analysis, nuisance attribute projection
PDF Full Text Request
Related items