Acoustic modeling and speaker normalization strategies with application to robust in-vehicle speech recognition and dialect classification

Posted on:2006-09-12

Degree:Ph.D

Type:Dissertation

University:University of Colorado at Boulder

Candidate:Yapanel, Umit

Full Text:PDF

GTID:1458390005495238

Subject:Engineering

Abstract/Summary:

The speech signal contains multiple levels of blended information which include the linguistic level such as the spoken message (content), language, accent/dialect, speaker-specific level such as the gender, emotion, stress, age, physical size (of speakers vocal tract), speaker identity, and environmental characteristics such as communication channel frequency response, microphone/recording media, and background noise. This dissertation focuses on improved automatic speech recognition in noise, and dialect speech conditions. Specifically, improved acoustic modeling is considered for in-vehicle environments. Reduction of inter-speaker variability within the feature set to increase the recognition performance is also considered. Finally the proposed algorithms are applied to dialect classification problem.; The first phase develops new front-ends for speech recognition in noisy car environments. We propose two new acoustic front-ends based on the MVDR method. The primary contribution is the formulation of a novel perceptual MVDR-based feature, the PMVDR front-end. We show that the PMVDR front-end outperforms previously proposed MVDR-based front-ends on standardized speech recognition tasks.; The second phase proposes a Built-In Speaker Normalization (BISN) algorithm which is similar to traditional Vocal Tract Length Normalization (VTLN). However, several improvements to the search stage are integrated to reduce the computational resources. Finally, an on-the-fly version is introduced and evaluated within the PMVDR framework. This implementation makes it possible to employ speaker normalization seamlessly within the front-end and re-apply/enhance the speaker normalization process as more speaker data becomes available.; In the final section, the proposed PMVDR acoustic front-end and BISN speaker normalization algorithm are applied for dialect classification. Since dialect differences in speech are observable at the phoneme level, the proposed classification algorithms are able to make use of a better acoustic front-end. Moreover, as speaker variability is reduced, it is expected that dialect-dependent traits of the input speech will become more dominant, thereby improving classification performance.; The formulated algorithms are shown to be valuable for speech parameterization and speaker normalization for real world tasks. Moreover, their successful application to a different speech classification problem in dialect confirms their importance and potential long-term impact in the field of speech processing and language technology, beyond the problem of robust automatic speech recognition.

Keywords/Search Tags:

Speech, Speaker normalization, Dialect, Acoustic, Classification, PMVDR

Related items

1	Advancements in robust algorithm formulation for dialect and speaker recognition
2	Neural dynamics of speech perception and production: From speaker normalization to apraxia of speech
3	Acoustic-feature-based frequency warping for speaker normalization
4	Research On Acoustic Analysis And Speech Synthesis For Lanzhou-Dialect
5	Research On Dialect Accent Classification Based On Deep Learning
6	Speaker Adaptation Of DNN-HMM Acoustic Model For Speech Recognition
7	Automatic dialect classification: Advances for read and spontaneous speech, and printed text
8	Research On Text Dependent Speaker Recognition For Tibetan Amdo Dialect
9	Speech Enhancement Method Fortibetan Speech Recognition In Lhasa Dialect
10	Research On Speech Synthesis Of Shanghai Dialect Based On Deep Learning