Font Size: a A A

Noise Robust Technologies In Speaker Recognition

Posted on:2005-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WuFull Text:PDF
GTID:2168360152968046Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Prevailing speaker recognition systems can obtain very high accuracy for clean speech, but their performance will degrade rapidly in noisy environments owing to the mismatch between the acoustic models and the testing speech. Therefore, noise robust technology is a crucial problem for the application of speaker recognition system in real life.Robust speaker recognition systems try to decrease the mismatch between the acoustic models and the testing speech introduced by interfering noise. The mismatch can be mapped into three spaces, i.e. signal space, feature space and model space. Accordingly, the techniques for robust speaker recognition can also be classified into three categories. This dissertation presents the author's research work on robust speaker recognition in additive noise, which includes a new robust feature PL_MFCC in feature space, direct cepstral coefficients liftering GMM in model space and fusion of multi-domain approaches.The dissertation first substitutes the logarithm transformation in standard MFCC generation by a combined function to amend the noisy sensitivity of the logarithm, and proposes a new robust feature PL_MFCC. We further combine PL_MFCC feature and speech enhancement methods. The results show that the PL_MFCC feature and the combination systems can effectively improve the system performance, and the combination of PL_MFCC and MMSE is most effective.According to the respective discriminative ability during the process of recognition, we weight every dimension of cepstral coefficients and propose direct cepstral coefficients weighting GMM model (noted as CW_GMM). Combination of CW_GMM and MMSE can further improve system accuracy.Another contribution of this dissertation is to study and compare several fusion schemes. MMSE+LA scheme combines MMSE and Log-Add. LA+CW scheme combines Log-Add and direct cepstral coefficients weighting GMM. MMSE+PL+CW scheme combines MMSE, PL_MFCC and direct cepstral coefficients weighting GMM. The results show that LA+CW scheme and MMSE+PL+CW scheme can significantly increase the recognition accuracy in noisy environments.
Keywords/Search Tags:speech enhancement, speaker recognition, GMM, MFCC, combination
PDF Full Text Request
Related items