Font Size: a A A

Research On Speaker Adaptation In Speech Recognition

Posted on:2008-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1118360215983679Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Today, various effective and rapid algorithms make the relization of continous speech rocgniton system become possible, however, when there exits mismatch between test speaker and training speaker, the performance of recognition will degrades severely. Speaker adaptation techniques aim to improve recognition performance in test eviroment with a small amount of data. This thesis will make research foucs on speaker adaptation based on our large vocabulary continuous speech recognition (LVCSR) system. The research and innovations are describedin details as follows:1. Comparison of MAP and MLLR AlgorithmTwo classical model based adaptation algorithm: MLLR and MAP dicussed in the thesis, and the experimental results show that either of these two methods work better than the baseline system to improve the recognition results. the difference of the two algorithms is MAP has desirable asymptotic properties and MLLR has better convergence properties. In MAP, prior knowledge of model parameter the the effect of speaker adaptation and in MLLR, the regression class also make its influence upon final results, so both of them are discussed in the paper. A further research is focus on the adaptation policy of combining two algorithms, from the experiments results we can conclude that the combine method is better than a single one. A unified view of normalization algorithm based on feature space and MLLR algorithm based on acoustic model is also presented in this paper.2. An improvement of clustering based speaker adaptationIn this paper, a new measurement for speaker clustering using cross likelihood ratio is proposed. In the process of recognition, the effective means of improving the adaptation is take advantage of the correlation of test speaker and training speakers as well as make full use of the adaptation data and training data available. In this paper, GMM based speaker clustering is adopted to reduce the number of reference models, based on it, chossing the appropriate reference speakers according the acustic feature of test speaker and realizing rapid speaker adaptation. In the clustering processing, the model CLR is used as distance measurement and universal background model is also used to provide a tighter coupling between the speaker(?)models.The adapted model can be calculated by using the previously stored hidden markov model (HMM) statistics, by which, a quick adaptation can be done. By using speaker clustering to perform speaker classification, the better performance is obtained even with different model mixture number.3 Dynamic selections of reference speakers and relative improvementThis thesis proposed a new method for dynamic selections of reference speakers by using SVM (support vector machine) which named as SSVS and, a relative improvement is also proposed named as RSSS. Good adaptation performance depends on not only the number of selected speakers but also whether these statistics are sufficient for describing the distribution of the reference speakers. How to select is still a very trick problems relied on the experiments. Dynamic instead of fixed number of close speaker selection can make a trade off between good coverage and small variance among the cohorts. In this paper, we try to find subset of training speakers who are acoustically close to the test speaker using (SVM) which outperforms general speaker selection method since it uses a smart way to choose an optimal set of reference models as well as save computation time. Experimental results show that SSVS algorithm can obtain relatively accurate model.It can be concluded that the dynamic selection of reference speakers depend on finding appropriate support vector. In the thesis, rely on the kernal function to compute the distance of two samples in high-dimensional feature space, we traversing the training set and get the samples set near the optimal classification surface, in which the samples is what we need to represent reference speakers. Meanwhile, confidence measure is using to the selection process, the experimental results show that the proposed method can improve the recognition accuracy effectively.
Keywords/Search Tags:Continous Speech Recognition, Speaker Adaptation, Reference Speaker Model, Speaker Clustering, Support Vector Machine
PDF Full Text Request
Related items