Font Size: a A A

Noise robust front-end processing for automatic speech recognition

Posted on:2002-09-02Degree:Ph.DType:Dissertation
University:University of California, Los AngelesCandidate:Zhu, QifengFull Text:PDF
GTID:1468390011997426Subject:Engineering
Abstract/Summary:
The performance of current automatic speech recognition (ASR) systems degrades greatly under noise. This dissertation focuses on the front-end approach to improving the noise robustness of ASR systems. Several novel algorithms are developed for feature extraction.; The first algorithm is variable frame rate analysis, which is inspired by human speech perception. It uses a high frame rate for rapidly-changing segments of high energy and a low frame rate for relatively steady segments.; An analysis-based non-linear feature extraction approach is proposed inspired by a quantitative model of how speech amplitude spectra are affected by additive noise. Acoustic features are extracted based on the noise-robust parts of speech spectra without losing discriminative information. Two nonlinear processing algorithms, harmonic demodulation and spectral peak-to-valley ratio locking, are designed to minimize mismatch between clean and noisy speech features. A previously studied method, peak isolation (Strope & Alwan, 1997), is also discussed with this model. These algorithms do not require noise estimation and are effective in dealing with both stationary and non-stationary noise backgrounds. A noise removal algorithm derived directly from the additive noise model is also tested and compared with the other new algorithms in this dissertation and with the linear and nonlinear spectral subtraction methods.; The proposed front-end processing algorithms are tested in Hidden Markov Model (HMM) based speech recognition experiments with the TI46 database and the Aurora 2 database. Significant improvement is observed by using these algorithms. For the TI46 isolated digits database, the average recognition rate across SNRs is improved from 60% (for the widely-used MFCC front-end) to 95% (using the proposed techniques) in the presence of additive speech-shaped noise. For the Aurora 2 connected digit-string database, the average recognition rate across different noise types, including non-stationary noise background, and SNRs is improved from 58% to 83%.; Finally, a DCT-based feature-coding scheme is proposed for distributed speech recognition. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, run-length and Huffman coding. Analysis and recognition experiments show that the 2D DCT can be an effective way in exploiting inter-frame correlation of acoustic features.
Keywords/Search Tags:Recognition, Noise, Speech, Front-end, Processing
Related items