Font Size: a A A

Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments

Posted on:1999-07-05Degree:Ph.DType:Dissertation
University:University of California, BerkeleyCandidate:Kingsbury, Brian E. DFull Text:PDF
GTID:1468390014971363Subject:Computer Science
Abstract/Summary:
Natural, hands-free interaction with computers is currently one of the great unfulfilled promises of automatic speech recognition (ASR), in part because ASR systems cannot reliably recognize speech under everyday, reverberant conditions that pose no problems for most human listeners. The specific properties of the auditory representation of speech likely contribute to reliable human speech recognition under such conditions. This dissertation explores the use of perceptually inspired signal-processing strategies---critical-band-like frequency analysis, an emphasis of slow changes in the spectral structure of the speech signal, adaptation, integration of phonetic information over syllabic durations, and use of multiple signal representations for recognition---in an ASR system to improve robustness to reverberation. The implementation of these strategies was optimized in a series of experiments on a small-vocabulary, continuous speech recognition task. The resulting speech representation, called the modulation-filtered spectrogram (MSG), provided relative improvements of 15--30% over a baseline recognizer in reverberant conditions, and also outperformed the baseline in other acoustically challenging conditions. The MSG and baseline recognizers may be combined to obtain more accurate recognition than is possible with either recognizer alone. Preliminary tests with the Broadcast News corpus indicate that the MSG representation is useful for large-vocabulary tasks as well.
Keywords/Search Tags:Speech recognition, ASR, MSG, Signal, Reverberant
Related items