Noise robust front-end processing for automatic speech recognition

Posted on:2002-09-02

Degree:Ph.D

Type:Dissertation

University:University of California, Los Angeles

Candidate:Zhu, Qifeng

Full Text:PDF

GTID:1468390011997426

Subject:Engineering

Abstract/Summary:

The performance of current automatic speech recognition (ASR) systems degrades greatly under noise. This dissertation focuses on the front-end approach to improving the noise robustness of ASR systems. Several novel algorithms are developed for feature extraction.; The first algorithm is variable frame rate analysis, which is inspired by human speech perception. It uses a high frame rate for rapidly-changing segments of high energy and a low frame rate for relatively steady segments.; An analysis-based non-linear feature extraction approach is proposed inspired by a quantitative model of how speech amplitude spectra are affected by additive noise. Acoustic features are extracted based on the noise-robust parts of speech spectra without losing discriminative information. Two nonlinear processing algorithms, harmonic demodulation and spectral peak-to-valley ratio locking, are designed to minimize mismatch between clean and noisy speech features. A previously studied method, peak isolation (Strope & Alwan, 1997), is also discussed with this model. These algorithms do not require noise estimation and are effective in dealing with both stationary and non-stationary noise backgrounds. A noise removal algorithm derived directly from the additive noise model is also tested and compared with the other new algorithms in this dissertation and with the linear and nonlinear spectral subtraction methods.; The proposed front-end processing algorithms are tested in Hidden Markov Model (HMM) based speech recognition experiments with the TI46 database and the Aurora 2 database. Significant improvement is observed by using these algorithms. For the TI46 isolated digits database, the average recognition rate across SNRs is improved from 60% (for the widely-used MFCC front-end) to 95% (using the proposed techniques) in the presence of additive speech-shaped noise. For the Aurora 2 connected digit-string database, the average recognition rate across different noise types, including non-stationary noise background, and SNRs is improved from 58% to 83%.; Finally, a DCT-based feature-coding scheme is proposed for distributed speech recognition. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, run-length and Huffman coding. Analysis and recognition experiments show that the 2D DCT can be an effective way in exploiting inter-frame correlation of acoustic features.

Keywords/Search Tags:

Recognition, Noise, Speech, Front-end, Processing

Related items

1	Speech Recognition Front-End Processing Based On Deep Neural Network
2	Research On Noise-robust Speech Recognition Based On Feature Extraction
3	The Research Of Front-end Processing Technology Based On The Speaker-independent Speech Recognition
4	Noising Processing And Recognition Under The Car Noise Background Isolated Word Speech Signals
5	Distributed Speech Recognition And Voice XML Standardlanguage In Vivid-Ring Application
6	Front And Back Ends Signal Processing For Speech Codecs
7	Research On Key Technologies Of Digitizing Speech Signal Processing
8	Design And Implementation Of Noise Robust Speech Recognition Algorithm Based On Deep Learning
9	Anti-noise Technology Combined Denoising Method Based Speech Recognition Studies
10	High-performance automatic speech recognition via enhanced front-end analysis and acoustic modeling