Font Size: a A A

Feature Compensation For Automatic Speech Recognition

Posted on:2011-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z YangFull Text:PDF
GTID:2178360308955289Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
This thesis is focused on the research topic of noise-robust front-end of automatic speech recognition (ASR).As we all know, the ultimate purpose of speech recognition is to make the computer understand human spontaneous language. And now many mature systems have got fairly high speech recognition rate in laboratory. However, the system's performance is too much worse to be used in real environment because of disturbance of various noises and unknown factors. Therefore, the noise robustness is a very important part of speech recognition research. The derivation of noise robustness can come down to the mismatch between training and testing environment. In our real world, this mismatch is caused by the influences of the speech collecting environment (additive noise, convolutional noise, etc.) and speaker (speaking style, accent, etc.), we can also regard this mismatch as influences of noises. In order to make the speech recognition system maintain the good performance under these noise conditions, we must use various methods to enhance the robustness of system.The noise-robust methods are various and be roughly classified into two categories: front-end methods and back-end ones. The front-end methods focus on mitigating the effect of noises by processing the speech signal or speech feature, while the back-end ones try to adjust models to meet the change of environments, which make models and real environments match. This thesis is primarily focused on the research of front-end noise-robust methods, and then some existing algorithms are implemented, several new methods are proposed.Firstly, this thesis gives an overview and summary on the development history of ASR in chapter one, and highlight the several important components of ASR which is based on the statistical modeling.There are many kinds of noise-robust front-end methods because of the diversity of noises, and each has its character and in-point range. Therefore, general introductions and conclusions are made in chapter 2 from four aspects including robust feature extraction, speech enhancement, feature compensation/enhancement and model adaptation.In chapter 3, we firstly introduce the offline feature compensation based on first-order Vector Taylor Series (VTS) approximation using explicit model of environmental distortion. But the offline algorithm is not perfect in practice. The biggest disadvantage of it is its huge computation which will reduce the system processing efficiency. Therefore, a practical first-order VTS approximation is proposed; it keeps the performance comparable to the offline condition, and also greatly increases the efficiency of the algorithmAlthough the practical first-order VTS algorithm has achieved good performance, but as is the offline algorithm, it assumes that for each sentence, the noise feature vector in cepstral domain follows one single Gaussian PDF (probability density function), this may be not a suitable description of the noise distribution because of the diversity and complexity of noises, thus the clean speech is estimated inaccurate, ultimately affect the recognition performance. So a first-order VTS approximation which assumes the noise feature vector in cepstral domain follows multi-Gaussian PDF is proposed in chapter 4.The results show that this method can improve the system's performance to some extent.
Keywords/Search Tags:ASR, Noise Robustness, VTS, feature compensation, practical, multi-Gaussian modeling
PDF Full Text Request
Related items