Font Size: a A A

Accent and speaker recognition for advanced automatic speech recognition

Posted on:2005-07-28Degree:Ph.DType:Thesis
University:University of Colorado at BoulderCandidate:Angkititrakul, PongtepFull Text:PDF
GTID:2458390008992122Subject:Engineering
Abstract/Summary:
The speech signal conveys many levels of information which incorporate: linguistics (e.g., text, language, accent/dialect), speaker-specific (e.g., gender, emotion, speaker identity), and environmental information (e.g., communication channels, background noises). This dissertation focuses on addressing the speech-pattern recognition for detection of foreign accent and speaker identity information.; The first thesis goal addresses the problem of computer based automatic speech accent classification. A phone-based accent classification framework is developed which makes a decision based on the likelihood scores from pre-defined accent classes. Novel spectral trajectory modeling techniques are applied for estimating accent-sensitive acoustic traits for whole phoneme segments, in an effort to better capture the spectral evolution of speech over conventional Hidden Markov Model methods. Integrated feature-space transformations are applied for dimensionality reduction and better discrimination among accent classes. Furthermore, for the first time the open-set accent detection problem, which aims to detect native and non-native speech when no pre-defined system models exist for that specific accent is explored. Comparable performance is achieved for most open accents using a closed set of four accent models.; The second thesis goal addresses the problem of in-set/out-of-set speaker recognition, where we identify a speaker as belonging to a group of pre-defined speakers. An effective algorithm is developed which employs spectral-based features within a Gaussian Mixture Model - Universal Background Model framework, enhanced by discriminative adaptation based on modified minimum classification error and minimum verification error criteria. Alternative speaker rejection criteria based on the distribution of in-set speaker discriminative score space are introduced and compared with the conventional log-likelihood ratio test. This represents the first published study addressing in-set speaker recognition.; Finally, the thesis concludes with a demonstration of the proposed algorithms for spoken document retrieval (SDR) using a collection of historical audio materials from the National Gallery of the Spoken Word. Results show that accent classification and in-set speaker recognition can successfully be integrated into an application for rich transcript generation in SDR. Collectively, the advances demonstrated in this research add new directions for future development in automatic accent classification, speaker recognition, and improving robustness in speech recognition, and next generation human-computer spoken language technology.
Keywords/Search Tags:Accent, Speaker, Speech, Automatic
Related items