Font Size: a A A

Study Of Speech Recognition Algorithm Under Noise Environment

Posted on:2012-04-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z LvFull Text:PDF
GTID:1118330338971098Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the speech recognition technology, more advancement has been seen in the performance of Automatic Speech Recognition (ASR) system. As a convenient, quick and effective style of the Human Computer Interface (HCI), ASR system has gotten an access to people's daily life gradually. However, due to the mismatch between the training and recognition environment, the performance of these ASR systems will dramatically degrade in practice. Therefore, how to improve the robustness of ASR system has become a key point which decides whether it can be widely used in practical conditions.Based on the summarization and analysis of different robust speech recognition algorithms and the influence of noise on ASR system, some research in the aspects of speech enhancement, feature enhancement and model compensation\enhancement aiming at the signals space, feature space and model space of ASR system will be presented in this thesis. The main research work and the innovation points are shown as follows:A dynamic noise power spectrum estimation method and an improved wiener filter algorithm were proposed, which utilize band-partitioning spectral entropy to achieve accurate and robust speech endpoint detection. Furthermore, in non-speech segment, noise power spectrum can be estimated frame by frame and it will be weighted with the previous one to calculate the prior Signal Noise Ratio (SNR) instead of the fixed noise power spectrum. Experimental results reveal that the proposed speech enhancement algorithm can improve recognition ratio on ASR system.A denoising algorithm based on multi-order autocorrelation was studied, which is used to retain the structure of speech frequency spectrum while suppressing the noise. It is a fact that the multi-order autocorrelation sequence is not severely affected by noise; therefore, the observation sequence after the multi-order autocorrelation can be utilized to suppress the noise instead of the noisy speech sequence. The inferential process of it has been given. Moreover, speech recognition experiment and results analysis under the different autocorrelation orders has been carried on.A robust speech features extraction algorithm based on Independent Component Analysis (ICA) was proposed, which is used to resolve the mismatch between training features and testing features in convolutive noise environment. In order to achieve this function, noisy speech signals are firstly converted from time-domain to frequency-domain via Short Time Fourier Transform (STFT), then a complex ICA algorithm is used to acquire short-time spectrum of speech signal from that of noisy speech signal, furthermore, Mel Frequency Cepstral Coefficients (MFCC) and its first-order differential coefficients are computed in accordance with the separated speech signals frequency spectrum. Experimental results reveal that the speech features based on frequency-domain ICA have a good robust performance.A permutation alignment algorithm based on Dynamic Time Warping (DTW) was proposed. It is used to eliminate permutation ambiguity in speech signal frequency-domain ICA algorithm. It is a fact that the adjoining frequency bin signal has a high similarity, and the algorithm can use dynamic time warping technology to match these adjoining frequency bins signal. Consequently, the positions can be adjusted according to the output. Experimental results show the proposed algorithm can reduce the errors of permutation and improve the quality of separated speech.The fundamental principle of the Parallel Model Combination (PMC) algorithm was studied. And its realization process has been also inferred in additive and convolution noise environment. In addition, a noise spectrum estimation method based on double channels was proposed. The algorithm firstly separates the short-time spectrum of speech and noise by frequency-domain ICA algorithm in reference channel, and then noise short-time spectrum is achieved by subtracting the estimated "clean" speech short-time spectrum from the noisy speech in main channel. Experimental results validated the accuracy of estimated noise signal and proved that the proposed algorithm could improve robustness of ASR system in noise environment.In another way, as a matter of fact, the recognition ratio of the traditional whole frequency band HMM will decrease when partial frequency bands are corrupted by noise. In order to solve this problem, a hybrid parallel sub-bands Hidden Markov Model (HMM) and neural network (NN) model were proposed. The algorithm firstly splits the whole frequency band HMM into a few sub-bands HMM, in which different speech recognizers can be independently applied. And then, some new feature parameters can be extracted according to all sub-bands HMM outputs. Finally, these new feature parameters are merged by the neural network in order to yield a global recognition decision. The results show that the proposed model can provide better robustness in the case of noisy speech.
Keywords/Search Tags:Speech recognition, Robustness, Speech enhancement, Feature extracting, Independent component analysis, Dynamic time wrapping, Parallel model combination, Neural net
PDF Full Text Request
Related items