Font Size: a A A

Robust speech recognition in a car using a microphone array

Posted on:2007-04-06Degree:Ph.DType:Thesis
University:University of Illinois at Urbana-ChampaignCandidate:Lee, BowonFull Text:PDF
GTID:2448390005465557Subject:Engineering
Abstract/Summary:
Performance of automatic speech recognition relies on a vast amount of training speech data mostly recorded with little or no background noise. The performance degrades significantly with background noise, which increases type mismatch between train and test environments. Speech enhancement techniques can reduce the amount of type mismatch.; At very low SNR with nonstationary noise, the enhanced speech may still contain significant noise either in noise-only segments or speech segments. The former masquerade as nonexistent speech and the latter as distorted speech. Both significantly degrade the performance of the automatic speech recognizer. This encourages the use of voice activity detection (VAD) algorithms to determine regions with speech present. To use only the reliable speech features, we need to further determine whether the features from the speech region are mainly from speech or from nonstationary noises masking the speech. For more robust speech recognition, this thesis proposes a three-hypothesis VAD consisting of H0: noise-only region; HS: speech-dominant speech region; and HN: noise-dominant speech region.; Spectrum-based VAD uses knowledge of the noise spectrum to detect voice activity using the nonstationary nature of speech. This thesis proposes a method of estimating the instantaneous noise spectrum for VAD. The spectrum-based VAD, however, cannot distinguish speech from nonstationary noise because both appear nonstationary to the VAD, and thus look like speech. A microphone array can determine the noise-corrupted speech region when the nonstationary noise is from a location other than that of the speech source. This thesis proposes a method of distinguishing HS from H N based on the steered response power (SRP) method, which estimates power from any location.; Phonemic restoration is a phenomenon in which humans claim to hear missing phonemes that have been replaced by noise. Given strong nonstationary noises occasionally masking the speech region, as well as knowledge of H S and HN, this thesis proposes a phoneme restoration approach for automatic speech recognition in the hidden Markov model framework.; The proposed approach has two steps: speech enhancement as a preprocessor of noisy speech signals, followed by the phoneme restoration for robust speech recognition against nonstationary noises given knowledge of H S and HN.
Keywords/Search Tags:Speech, Nonstationary, Microphone array, Spectrum-based VAD, Phoneme restoration, Thesis proposes
Related items