Font Size: a A A

Unrelated Phone Voice Speaker Recognition Based On Feature Transformation And Classification Of Text

Posted on:2008-10-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L JieFull Text:PDF
GTID:1118360212998579Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As to the practicality, the study of the text-independent speaker recognition with telephone speech has become a hot research subject in the field of speech recognition. Currently the speaker recognition systems based on the statistical models such as GMM are the state-of-the-art systems under the complex background (multi-environment and multi-transmission channels). The statistical speaker models change the problem of speaker recognition to the estimating of the distribution of the speakers' speech data and get good recognition results.However the statistical models rely much on the data. If the training data is limited, the parameters of the models will be too many to be estimated precisely. And if the training and test data is mismatch, the parameters of the models evaluated with the training data are not suitable for the test data. Then it will impact the performance of the models under the practical complex background. In order to improve the performance and the robustness of the text-independent speaker recognition system further, this paper conducts a deep study from the points of the feature transformation and the classification in the feature space.Firstly, as to the problem the precise GMM is difficult to be trained when the training and test speech data is mismatch, a method of piecewise normalization of the cumulative distribution of speech parameters and a method of kurtosis normalization for parameters are proposed in this paper separately. The two methods map the distributions of the training and test parameters to the distributions of approximate Gaussian distributions in the means of the cumulative distribution and the kurtosis respectively. So it will be more suitable to model the statistical distribution of the feature with models of fewer mixtures. And the parameters of GMM can be estimated more precisely. The two methods both solve the over-training problem of the models in some sense and improve the robustness to the length and the environment of the speaker verification system with telephone speech. Especially for the kurtosis normalization method, its transform function can be adjusted with the training data. The method will not waste speech data for the normalization and get better performance for the speaker recognition with short speech. So it plays an important role in the practicability of the speaker recognition technology.Secondly, as to the problem of the speech data is short in the text-independent speaker verification with telephone and handset speech, this paper propose the speaker verification system with the frame of CGMM-UBM with the feature class and multi sub-system SVM fusion. The frame is proposed based on the unevenness of the distribution of the speech feature vectors in the cepstral space and the different part of the cepstrum in the cepstral space shows different contribution and influence to the speaker verification systems. The experiments show the frame of CGMM-UBM can make better use of the training data and the mixtures of the models are fewer and the efficiency of models training is higher. The CGMM-UBM system also gets better recognition results and is robust to noise. The system is more suitable for the speaker verification with short speech. Since the SVM based fusion is trained with two class data, it has better discriminability and can elaborately reflect the relation between the sub-systems. The SVM fusion develops the potential of the sub-systems of the CGMM-UBM. In addition the SVM fusion normalizes the output scores and reduces the dependence on the threshold of the verification system.The experiments based on the above methods all get good results and prove the effectiveness of the methods. Finally as to the additive noise in the speaker recognition, this paper study the two-stage Wiener filter used in the ETSI DSR AFE standard. The standard is famous for its high performance in the field of robust speech recognition. The paper also proposes an unsupervised segmentation based VAD algorithm and replace the VAD module of the noise power spectrum estimation of the Wiener filter in the standard with the algorithm. It improves the robustness to the additive noise of the speaker identification system much.The thesis was supported by the National Foundation of Natural Science (No. 60272039), the Science Research Fund of MOE-Microsoft Key Laboratory of Multimedia Computing and Communication (No.05071810), Anhui province Foundation of Natural Science (No. 901042205) and USTC Graduate Individual Creative Foundation (KD 2005052) .
Keywords/Search Tags:Transformation
PDF Full Text Request
Related items