Font Size: a A A

On the robustness of static and dynamic spectral information for speech recognition in noise

Posted on:2006-10-14Degree:Ph.DType:Thesis
University:The Chinese University of Hong Kong (People's Republic of China)Candidate:Yang, ChenFull Text:PDF
GTID:2458390008465211Subject:Engineering
Abstract/Summary:
Automatic speech recognition (ASR) technology has achieved a high performance level in controlled laboratory environments, where background noise and channel variation are rather benign. However, for real-world applications, the performance of ASR systems may degrade greatly because of the mismatch between the training condition and the operating conditions.; In this thesis, we investigate the noise robustness of acoustic features in the cepstral domain, which have been successfully used in most of the state-of-the-art ASR systems. We attempt to discern to what extent we can make the recognition process insensitive to noise by exploiting the unequal robustness of different feature components. Our approach requires neither adaptation of the acoustic models nor front-end compensation.; Dynamic cepstral features supplement static features in characterizing their temporal trajectory. It has been widely known that the use of dynamic features improves the performance of speech recognition. However, few quantitative and systematic studies have been done to examine the robustness of static and dynamic features for ASR in noise. In this research, by investigating the noise robustness of the static and dynamic cepstral features in a quantitative way, we find that the dynamic features are more robust to noise than their static counterparts. Accordingly, we propose a simple but effective noise-robust speech recognition strategy by exponentially weighting the likelihoods of the static and dynamic features during the decoding process. A discriminative training procedure is developed to estimate the optimal feature weights automatically using a small amount of development data. This approach is evaluated on two connected-digit databases, one in English (Aurora 2) and the other in Cantonese (CUDigit). Significant performance improvements over the conventional un-weighted baseline recognition system are attained using condition-specific weights under a variety of noise conditions. The overall relative Word Error Rate (WER) reductions are 36.55% and 41.92% for Aurora 2 and CUDigit respectively. The proposed approach is appealing for practical applications because: (1) noise estimation is not required for feature compensation; (2) adaptation of HMMs to noisy environments is not required; (3) only a minor modification of the decoding process is needed; (4) only a few feature weights need to be trained. (Abstract shortened by UMI.)...
Keywords/Search Tags:Speech recognition, Noise, Dynamic, ASR, Robustness, Performance
Related items