Font Size: a A A

A Study Of Several Problems On Noise Robust Speech Recognition

Posted on:2008-11-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J LeiFull Text:PDF
GTID:1118360215983678Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Along with the continuous progress made in speech recognition technologies, prevailing speech recognition systems can obtain a very high accuracy for reading speech in clean conditions. But the performances of speech recognition systems are known to degrade substantially in adverse acoustical environments due to a mismatch between training and testing conditions. Therefore, the noise robustness becomes a very crucial problem for the real application of speech recognition. Based on the review and analysis of existing noise robust speech recognition technologies, this dissertation studies the technologies in signal-space and feature-space, including speech enhancement, feature compensation, feature normalization and voice activity detection. The main contributions and innovations are described in details as follows:1. Based on the discussion of existing noise robust speech recognition technologies, we summarized and classified them into the signal-space, feature-space and model-space robust speech recognition technologies. On the basis of introduction to various noise robust approaches, we expounded the main problems of noise robust speech recognition.2. GMM-based two-stage Mel-warped Wiener filter. A new approach based on Gaussian Mixture Model (GMM) is presented for estimating the a priori SNR in speech enhancement based on short-term spectral amplitude estimation. In the proposed method, a GMM for clean speech spectra is trained beforehand. In the process of speech enhancement, the speech spectra and the a priori SNR are estimated based on the GMM, and then the a priori SNR is used to speech enhancement system. Speech enhancement experiments show that the proposed method achieves considerable improvement compared with commonly recursive averaging method. And then, we used the proposed estimating method to the two-stage Mel-warped Wiener filtering algorithm in ETSI advanced front-end for distributed speech recognition, obtained a GMM-based two-stage Mel-warped Wiener filtering algorithm, and significantly improved the noise robustness of speech recognition system.3. Improved MVA-based algorithms for noise robust speech recognition. Based on the investigation on feature normalization algorithms, we presented an effective scheme combining speech enhancement with feature normalization to improve the robustness of speech recognition system. At front-end, minimum mean square error log-spectral amplitude speech enhancement is adopted to suppress noise from noisy speech. Nevertheless, this enhancement is not perfect and the enhanced speech retains signal distortion and residual noise which will affect the performance of recognition systems. Thus, at back-end, the MVA (Mean-Variance Normalization, ARMA filter) feature normalization is used to deal with the remaining mismatch between enhanced speech and clean speech. Experimental results show that our approach exhibits considerable improvements in the degraded environment. And then, we described a method that combining feature compensation with MVA feature normalization for robust speech recognition. We also studied the different combining forms, obtained the best combining form, and improved the performance of speech recognition system.4. Improved voice activity detection based on LRT. Based on the study of Likelihood Ratio Test (LRT) voice activity detection algorithm and discussion about decision-directed LRT and smoothed LRT voice activity detection algorithms, we proposed a novel voice activity detection algorithm for improving speech detection robustness in noisy environments. In our new algorithm, speech spectral GMM is introduced to model the clean speech spectra, and a GMM based LRT voice activity detection algorithm is obtained. Experimental results show that the proposed approach is useful for achieving a considerable performance improvement in voice activity detection.
Keywords/Search Tags:noise robust speech recognition, speech enhancement, feature compensation, feature normalization, voice activity detection
PDF Full Text Request
Related items