Font Size: a A A

Language identification using Gaussian mixture models

Posted on:2003-07-19Degree:Ph.DType:Dissertation
University:Michigan State UniversityCandidate:Torres-Carrasquillo, Pedro AFull Text:PDF
GTID:1468390011484705Subject:Engineering
Abstract/Summary:
With the increasing globalization of speech information access and interfaces, applications of automatic language identification (LID), identifying the language of a spoken utterance, are increasing. These applications include routing of callers to operators who speak their language and selecting language-dependent speech recognizers. Current state of the art systems rely on phoneme recognition followed by n-gram language modeling for performing the identification task and have high recognition accuracy. However, phoneme recognition can be computationally burdensome and difficult to rapidly adapt for some applications. In this work, we examine a more general and computationally efficient alternative using Gaussian Mixture Models (GMM) for capturing both acoustic and phonetic structure information. In the system developed, the state sequence tokenization of the speech features through the GMM is used to replace the phoneme recognition tokenization. Additionally, the acoustic match score of the speech features to the GMM, which comes at no additional computational cost, is used with the state sequence tokens to further improve performance. Performance is obtained for the CallFriend corpus resulting in a 12-way closed error rate of 18.6% compared to 21.5% for the state of the art phoneme recognition system.
Keywords/Search Tags:Language, Phoneme recognition, Identification, Speech, State
Related items