Font Size: a A A

Audio-visual speech recognition for difficult environments

Posted on:2003-01-27Degree:Ph.DType:Dissertation
University:Clemson UniversityCandidate:Patterson, Eric KendallFull Text:PDF
GTID:1468390011489067Subject:Computer Science
Abstract/Summary:
The work presented in this dissertation focuses on audio-visual speech recognition for difficult environments where background noise, speaker movement, or non-user speakers may cause degradation of performance. There are four main parts to this work. The first involves a study of data fusion for optimal recognition in noisy environments. Audio-visual speech recognition using “late integration” is investigated under various types and levels of background noise. The second involves the creation of a multi-modal speech database to facilitate research in this area. The speech corpus includes continuous and connected digits spoken by a wide variety of stationary and moving speakers as well as speaker pairs. The third part is a feature study using moving speakers from the database. An image-processing-based method, an image-transform technique, and a deformable-template method are compared and tested for invariance to speaker movement. The final part investigates using an audio-visual approach to improve speech recognition among multiple, simultaneous speakers. Overall, the addition of visual features is shown to improve upon audio-only performance in noisy and multispeaker environments, and techniques are presented that yield improved speech-reading performance for moving talkers.
Keywords/Search Tags:Speech, Environments, Speaker
Related items