On the robustness of static and dynamic spectral information for speech recognition in noise

Posted on:2006-10-14

Degree:Ph.D

Type:Thesis

University:The Chinese University of Hong Kong (People's Republic of China)

Candidate:Yang, Chen

Full Text:PDF

GTID:2458390008465211

Subject:Engineering

Abstract/Summary:

Automatic speech recognition (ASR) technology has achieved a high performance level in controlled laboratory environments, where background noise and channel variation are rather benign. However, for real-world applications, the performance of ASR systems may degrade greatly because of the mismatch between the training condition and the operating conditions.; In this thesis, we investigate the noise robustness of acoustic features in the cepstral domain, which have been successfully used in most of the state-of-the-art ASR systems. We attempt to discern to what extent we can make the recognition process insensitive to noise by exploiting the unequal robustness of different feature components. Our approach requires neither adaptation of the acoustic models nor front-end compensation.; Dynamic cepstral features supplement static features in characterizing their temporal trajectory. It has been widely known that the use of dynamic features improves the performance of speech recognition. However, few quantitative and systematic studies have been done to examine the robustness of static and dynamic features for ASR in noise. In this research, by investigating the noise robustness of the static and dynamic cepstral features in a quantitative way, we find that the dynamic features are more robust to noise than their static counterparts. Accordingly, we propose a simple but effective noise-robust speech recognition strategy by exponentially weighting the likelihoods of the static and dynamic features during the decoding process. A discriminative training procedure is developed to estimate the optimal feature weights automatically using a small amount of development data. This approach is evaluated on two connected-digit databases, one in English (Aurora 2) and the other in Cantonese (CUDigit). Significant performance improvements over the conventional un-weighted baseline recognition system are attained using condition-specific weights under a variety of noise conditions. The overall relative Word Error Rate (WER) reductions are 36.55% and 41.92% for Aurora 2 and CUDigit respectively. The proposed approach is appealing for practical applications because: (1) noise estimation is not required for feature compensation; (2) adaptation of HMMs to noisy environments is not required; (3) only a minor modification of the decoding process is needed; (4) only a few feature weights need to be trained. (Abstract shortened by UMI.)...

Keywords/Search Tags:

Speech recognition, Noise, Dynamic, ASR, Robustness, Performance

Related items

1	Study Of Speech Recognition Algorithm Under Noise Environment
2	Speech Emotion Recognition Research Based On Noise Robustness
3	Design And Implementation Of Noise Robust Speech Recognition Algorithm Based On Deep Learning
4	Research On Noise-robust Speech Recognition Based On Feature Extraction
5	Research On Speech Enhancement Based On Noise Bases And Its Robustness
6	Noise Immunity Of Continuous Speech Recognition Research
7	Compressive nonlinearity for representing speech spectral magnitude to improve noise robustness of automatic speech recognition
8	Robustness Weighting Techniques For Noisy Speech Signal Processing
9	Robust Speech Recognition In Car Noise Environment
10	Study On Robust Speech Recognition Method Of Isolated Word In Small Vocabulary