Font Size: a A A

Automatic Mask Estimation In Speaker Recognition

Posted on:2011-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178360308955453Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Text-independent speaker recognition performs well in noise free environment, but the presence of additive noise can cause the mismatch of training and testing environments, which make the performance of system degrade dramatically. Missing feature method labels the highly corrupt region of spectrogram, using only the reliable features to perform recognition. This can enhance the robustness of speaker recognition in noisy environment.Missing feature methods in text-independent speaker recognition operate by first identifying components of a spectrographic representation of speech that are considered to be corrupt. This process is called mask estimation. Accurate estimation of masks plays a fundamental role in missing feature reconstruction and recognition. Most mask estimation methods rely on the local SNR, which need to estimate the characteristic of noise. Inaccuracy of noise estimation will lead to the poor performance of the system. In non-stationary environment, especially, it is hard to estimate noise. If we can make no assumption of noise signal, and only exploit the characteristics of speech signal, then, we can reduce the impact of inaccuracy of noise estimation to the recognition system. The thesis mainly discusses the automatic mask estimation in text-independent speaker recognition in noisy environment.In order to actualize the automatic mask estimation, we have to abstract some speech features which can describe the corrupted degree caused by noise. Utilizing these speech features, we can get the reliability of spectrogram directly. Four speech features are discussed, and their performance is verified in theory and by experiment. Our experiments show that these speech features have some relationship with SNR in different noisy environment and different SNR. These speech features have promising utilization in mask estimation.These speech features can convert the mask estimation to a classification problem, which can be used to train classifiers. And trained classifiers can classify the reliability of spectrogram directly. Our automatic mask estimation method has to be actualized. We trained Neural Network classifiers using four features, which can separately describe the corrupted degree of clean speech in every frequency band. The combined use of four features can also avoid the situation that in certain frequency band, a single speech feature can not describe the impact of noise. In different SNR and stationary F16 noise, non-stationary factory and babble noise, our experiments perform recognition using our automatic mask estimation and missing feature reconstruction method. The results show that in the same experimental condition, compared with oracle mask estimation and spectral-subtraction based mask estimation, our method performs well in not only the estimation accuracy but also the recognition accuracy. We discuss the expansibility of this automatic mask estimation, which includes: Train a classifier with not only one noise but two; Use GMM-based reconstruction method not cluster-based method. The experiment results show our method also has good performance.
Keywords/Search Tags:speaker recognition, mask estimation, classifier, missing feature reconstruction
PDF Full Text Request
Related items