Research On Speaker Recognition Over Short Utterance And Varying Channels | Posted on:2014-01-28 | Degree:Doctor | Type:Dissertation | Country:China | Candidate:Y Jiang | Full Text:PDF | GTID:1228330395983696 | Subject:Pattern Recognition and Intelligent Systems | Abstract/Summary: | PDF Full Text Request | The automatic speaker recognition technologies have developed into more and more important moden technologies required by many speech-aided applications. The main challenge for automatic speaker recognition is to deal with the shortage of test speech and the variability of the environments and channels from where the speech was obtained. In previous work, good results have been achieved for clean high-quality speech with matched training and test acoustic conditions, such as high accuracy of speaker identification and verification using clean wideband speech and Gaussian Mixture Models (GMM). However, under short utterance, mismatched channels and even distant environments, often expected in real-world conditions, the performance of GMM-based systems degrades significantly, far away from the satisfactory level. In order to further improve the practicability of the speaker recognition, robustness becomes a crucial research issue in speaker recognition field.Recently, the variability of training and testing channels is the biggest obstacle to hinder the development of speaker recognition. The variability can be the different channel types from training and testing speech, same channel types but different collection equipments (telephone, microphone etc), different collection environments (quite or noisy) and different collection manner (close or distant) for training and testing. In this thesis, our main focus is to improve the robustness of distant speaker identification and speaker verification under different channel types for training and testing speech.The major research work in the dissertation includes the following several aspects:1、For the inadequated training and testing speech data of speaker identification based on short utterance, feature vectors and GMM models are optimized and imporved, an efficient GMM based on local principal component analysis (PCA) with fuzzy clustering is presented. To compensate for the limited feature samples, the effective feature dimensions are increased with feature combinations instead of single feature. Furthermore, the time and space complexity of the system can be compressed by reducing dimensions of feature combitions with local fuzzy PCA in the premise of little effect on recognition rate. The dimentionality of original feature is reducd from48dimensions to16dimensions. Meanwhile, modeling time is reduced by nearly65%. Finally, a new approach which combines division and fuzzy k-means clustering in used, in order to optimize GMM initialization parameters. The improved method is more effective in improving performance of the system than traditional initialization methods.2、Robust speaker identification is presented for testing speech recorded by distant microphone. Three compensation approaches are investigated to improve the robustness of speaker identification in such environments. The first approach applies spectral subtraction before feature extraction to reduce the late-reverberation effect and minimize the differences in the quality between training and testing speech. This can achieve the purpose of speech enhancement. The second approach makes use of feature warping as feature compensation in ordre to make the same speaker’s training dn test speech feature space distribution as far as possible consistent. The third compensation approach also employs a novel method of initializing GMM parameters based on the research of short utterance speaker identification:combined division and k-means clustering. Three compensation methods corresponding to the three important stages for speaker identification:speech enhancement, feature extraction and model training. Compared with the traditional GMM method, greatly improved the distant speaker identification system performance.3、Based on the state-of-the-art algorithm named joint factor analysis (JFA) that can deal with speaker verification under the variability of channels. A novel method of eigen-channel space stitching technology is proposed as the improvement of traditional JFA, which can solve the problem of degraded performance in speaker verification because of unbalanced training speech from various channels. The speech from each channel is trained into corresponding eigen-channel space matrix by the stitching algorithm, and then the trained matrixes are put together as a final JFA model in the initialization of eigen-channel space matrix. The last eigen-channel space matrix can be trained using this initialized matrix. On the basis of JFA technology, we further investigate the I-Vector technology. A novel channel compensation approach named combined linear discriminant analysis (LDA) and within-class covariance normalization (WCCN) is proposed according to the analysis of several original channel compensation approaches. The algorithm can combine the advantages of the distinguish maximization in LDA and overall cost minimization in WCCN. Meanwhile, it can also effectively improve the I-Vector speaker verification performance.4、On the basis of the probabilistic linear discriminant analysis (PLDA) under I-Vector space to handel speaker and sessioin variability for speaker verification task, we advocate the use of uncompressed form of i-vector. An I-Vector is a low-dimensional vector containing both speaker and channel information acquired from a speech segment. When PLDA is used on I-Vector, dimension reduction is performed twice-first in the I-Vector extraction process and second in the PLDA model. Keeping the full dimensionality of I-Vector in the supervector space for PLDA modeling and scoring would avoid unnecessary loss of information. The drawback of using PLDA on uncompressed I-Vector is the inversion of large matrices, which we show can be solved rather efficiently by portioning large matrix into smaller blocks. We also propose the Gaussianized rank-norm in supervector space, for feature normalization prior to PLDA modeling. | Keywords/Search Tags: | speaker recognition, speaker identification, speaker verification, shortutterance, distant, JFA, I-Vector, PLDA | PDF Full Text Request | Related items |
| |
|