Font Size: a A A

Model selection based speaker adaptation and its application to nonnative speech recognition

Posted on:2004-10-02Degree:Ph.DType:Dissertation
University:University of Missouri - ColumbiaCandidate:He, XiaodongFull Text:PDF
GTID:1468390011464507Subject:Computer Science
Abstract/Summary:
Rapid globalization requires speech recognition systems to handle not only speech spoken by native speakers, but also speech spoken by foreign speakers. Currently, most American English speech recognition systems are built from speech data of American native English speakers. Although these systems work very well for native speakers, their performances degrade dramatically on recognition of foreign accented speech. Moreover, due to wide varieties of foreign accents, different speaking proficiency levels of English and limited data, in general it is difficult to train a specific acoustic model for each foreign accent. Therefore a practically feasible way to improve the performance of nonnative speech recognition is fast model adaptation.; In this dissertation, the problem of adapting acoustic models of native English speech to nonnative speakers is addressed from the perspective of adaptive model selection. The goal is to dynamically select the optimal model for each nonnative talker so as to balance model robustness to pronunciation variations and model details for discrimination of speech sounds. A maximum expected likelihood (MEL) based technique is proposed for reliable model selection when adaptation data is sparse, where expectation of log-likelihood (EL) of adaptation data is computed based on distributions of mismatch biases between model and data, and model is selected to maximize EL. Moreover, in order to obtain reliable results when the available data is very limited, an improved prior knowledge guided MEL (P-MEL) approach is also proposed by using maximum a posteriori (MAP) estimation of bias distributions. These model selection methods are further combined with Maximum likelihood linear regression (MLLR) to enable adaptation of both structure and parameters of acoustic models.; Experiments were performed on data of speakers with a wide range of foreign accents. Results show that the MEL based model selection can dynamically select proper model according to the available adaptation data, and the P-MEL approach can achieve a good performance even when the data amount is very small. Compared with the standard MLLR, the MEL+MLLR and the P-MEL + MLLR methods led to consistent and significant improvement to recognition accuracy on nonnative speakers, without performance degradation on native speakers.
Keywords/Search Tags:Recognition, Native, Speech, Model, Speakers, Adaptation, MLLR, Data
Related items