Font Size: a A A

Automatic speechreading for improved speech recognition and speaker verification

Posted on:2003-03-07Degree:Ph.DType:Thesis
University:Georgia Institute of TechnologyCandidate:Zhang, XiaozhengFull Text:PDF
GTID:2468390011989682Subject:Engineering
Abstract/Summary:
This thesis addresses two related problems in an automatic speechreading system, namely visual speech feature extraction and audio-visual integration. Two applications that exploit speechreading in a joint audio-visual speech signal processing task are developed: audio-visual speech recognition and biometric speaker verification.; A color-based visual feature extraction algorithm is proposed. The algorithm first reliably locates the mouth region by using color and motion information from a color video sequence of a speaker's frontal view. The algorithm subsequently segments the lip region by using a Markov random field framework and derives a relevant set of visual speech parameters. By enabling extraction of an expanded set of visual speech features, this visual front end achieves an increased accuracy in a speech recognition task when compared to previous approaches. Experimental results on a speaker verification task also demonstrate that the visual speech information is highly effective for reducing error rates over acoustic information alone.; A new audio-visual fusion model that uses a coupled hidden Markov model (CHMM) is also proposed. This model is able to capture the temporal correlations between the audio and visual information by allowing asynchrony between the two sources while preserving their temporal coupling. It is demonstrated that the CHMM performs better than other existing integration models. The performance benefit of using the visual modality is observed on both clean speech and under noisy conditions.
Keywords/Search Tags:Speech, Visual, Speaker verification, Feature extraction
Related items