Font Size: a A A

The Research Of Front-end Filter For Speaker Independent Robust Speech Recognition

Posted on:2012-03-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:L X HuangFull Text:PDF
GTID:1118330332991029Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The speaker independent speech recognition performs well under clean environmental conditions. In noisy environment, however, the recognition rate drops dramatically. Moreover, the recognition accuracy is also affected by the speech variability, which certainly increases the recognition difficulty. Aiming at improving the robustness with respect to the noise and the speech variability, this thesis was mainly focused on the research of the front-end filter which played a significant role in the feature extraction process. The designing approach of the filter was based on perceptual criteria and the speech signal itself respectively, which ensured the filter more matchable with human hearing property or more elaborately analyzing the speech signal. The noise robustness experiments show that, with the improvement of the filter property, the corresponding feature is more robust. Furthermore, the improved filter performance and the increased variability robustness are consistent. The main contributions of this thesis are presented as follows.(1)Based on the FIR filter, the designing approach of Laguerre filter was described in details. The Laguerre filter was used instead of FIR in extracting ZCPA (Zero Crossing Peak Amplitude). The process of Laguerre filter in extracting ZCPA in frequency domain was illustrated carefully. The Laguerre filter not only had the FIR's linear phase, but also had the long memories of IIR's. It compensated for the poor stop-band and pass-band property in FIR. The experiments show that, the Laguerre filter, which was provided exactly each channel's center frequency and bandwidth, is more robust compared to FIR.(2)The FIR and Laguerre filters had the symmetrical bandwidth, which did not fit for the human hearing property. In order to solve this problem, the WFBs (Warped Filter Banks) were completed, which were used for ZCPA extraction. The warped factorρin the first-order all-pass function controlled the center frequency and bandwidth distribution of filters. Thus the bands were nonuniform and each bandwidth was un symmetrical. The typicalρ=0.48 andρ=0.63 corresponded to the Bark-scale and ERB-scale separately. Compared to FIR and Laguerre, the WFBs required no exactly each band's center frequency and bandwidth. It got the 16 channels frequency response simultaneously. The experiments show that, compared to uniform bands and symmetrical bandwidth, the nonuniform bands and unsymmetrical bandwidth improve the recognition rates significantly. Moreover, compared to FIR and Laguerre filters, although the WFBs have a simple design method, they have the unsymmetrical bandwidth. Therefore, the ERB-scale WFBs has the better recognition results and noise robustness.(3)For analyzing the speech signal itself, based on digital signal processing theory, the OFB (Optimize Filter Bank) was proposed. Then ABFB (Adaptive Bands Filter Bank) was represented. Although FIR, Laguerre and WFBs were filter models which were based on human hearing criteria, the OFB model innovatively used recognition performance for benchmark. This approach originally combined the front-end filter and back-end recognition system as a closed circuit for optimization by Genetic Algorithm. The experiments show that the OFB model outperforms Bark-scale filter. However, the OFB cannot be easily applied because the models are in large quantity. Therefore the ABFB model was built by simplifying the OFB model. The experiments show that ABFB's performance is still better than Bark-scale filter. It is even better than ERB-scale filter. Among the FIR, Laguerre, WFBs and ABFB models, the ABFB has the best noise robust performance, which also demonstrates that speech signal itself is important for filter design.(4)The number of filter bands corresponds to how precise the analysis of signal. FIR, Laguerre, WFBs and ABFB filters adopted 16 bands and used 16 frequency bins for ZCPA extraction. When using Gammatone (GT) filter for extracting ZCPA,K channels were designed and the corresponding number frequency bins were used to accept amplitude information. The experiments show that 18-channel GT gets the better recognition results than any other channels of GT filter.(5)Applying FIR, GT, Laguerre and WFBs filters to the variability corpus in speaker independent recognition task, the experiments show that with the improvement of filter property, the variability robust has also been improved. Moreover, compared to MFCC, ZCPA is more robust with Support Vector Machine (SVM) than with Hidden Markov Model (HMM).
Keywords/Search Tags:Zero Crossing Peak Amplitude (ZCPA), FIR filter, Laguerre filter, Warped Filter Banks (WFBs), Optimized Filter Bank (OFB), Adaptive Bands Filter Bank (ABFB), Gammatone (GT) filter, OLdenburg LOgatome (OLLO) speech corpus
PDF Full Text Request
Related items