Font Size: a A A

Research On Robust Algorithms In Continuous Speech Recognition

Posted on:2007-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:W XuFull Text:PDF
GTID:1118360212975809Subject:Military Intelligence
Abstract/Summary:PDF Full Text Request
The inter-speaker variation, channel distortion and background noise result in the mismatch between the training condition and the testing condition. The mismatch degrades significantly the performance of the speaker-independent continuous speech recognition system. In order to increase the robustness and adaptation ability of Chinese continuous speech recognition, speaker normalization, speech enhancement, endpoint detection, feature compensation and uncertainty decoding methods respectively viewed from signal space, feature space, model space are studied in detail in this dissertation. Some new methods are proposed by using a lot of experiments. The main contributions of the dissertation are as follows:1. A vocal tract length normalization method based on the bilinear frequency warping is proposed. The traditional frequency warping methods have the faults that the vocal tract model is too simple and the bandwidth (BW) of the transformed signal differing from that of the original. We compute the frequency warp factor by the cut-off frequency map of the prototype low-pass filter to the desired low-pass filter. Then the Mel filterbanks are adjusted by bilinear frequency warping to get the vocal tract normalization MFCC. The method avoids the exhaustive search for the frequency warp factor and warps the spectrum continuous without suffering the bandwidth problem. It is proved to be a quite fast adaptation technique, and especially suitable for the unsupervised adaptation. The effectiveness of this method is examined on isolated and continuous speech recognition. The baseline isolated digit recognizer is trained on adult males' data and the baseline continuous speech recognizer is trained on men's data respectively. After the vocal tract normalization, in isolated digit speech recognition, the recognition accuracy of adult female's isolated digit is improved from 71.50% to 91.00% and that of children's isolated digits is improved from 71.00% to 84.00%. In continuous speech recognition, the recognition accuracy of continuous speech of women is improved from 13.91% to 50.56%.2. In order to increase the robustness of speech recognition in multi-channel environment, a GMM (Gaussian Mixture Model)-based channel classifier is used. If the speech signals filtered by a kind of channel are modeled by a GMM, the difference of the channels can be characterized by the GMM. The GMMs of different channels are discriminable. A GMM-based channel classifier is used to the select a most likely HMM from pre-trained HMMs of each specific telephone channel environment. The selected HMM is used as the reference HMM to recognize each utterance. The results of Mandarin continuous speech recognition show that the proposed speech recognition scheme is an efficient framework to enhance the robustness of speech recognition in multi-channel environment.3. A speech enhancement algorithm based on discrete cosine transform and hearing masking properties is deduced. The discrete cosine transform is used to approximate to Karhunen-Loeve transform (KLT) in the subspace-based speech enhancement, which reduces the computation of eigenvalues of a N×N symmetric Toeplitz matrix...
Keywords/Search Tags:Recognition
PDF Full Text Request
Related items