Font Size: a A A

Mongolian Language Oriented Research On Acoustic Modeling For Speech Recognition

Posted on:2017-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:X R M BaoFull Text:PDF
GTID:1108330485966600Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Speech recognition is a human-machine interface technology with huge application value and vast application prospects, while acoustic modeling is the key step and core technique of constructing speech recognition systems, and also is one of the research emphases and hot topics of speech recognition.In this dissertation, we conducted in-depth research on some relevant issues of Mongolian speech recognition acoustic modeling. Currently, under the influence of international informatization process, Mongolian autonomous regions are stepping into information society with great speed. Under this historic background, carrying out research work of this dissertation has not only great academic value, but also positive and far-reaching social meanings. Our research work will be beneficial for the improvement of the automation level of the life, study and work of the Mongolian people, and will be beneficial for the improvement of the informatization level of the Mongolian autonomous regions.Our research work corresponds to three basic problems of Mongolian speech recognition acoustic modeling, i.e., model selection, relevant supporting techniques and parameter estimation, specifically, innovation points and research contents of this dissertation are as follows.1. Topology optimization of Mongolian acoustic modelsIn the aspect of model selection, aiming at the current situation of selecting acoustic model topologies of large modeling objects whose pronunciation is made up of two or even more phones empirically or heuristically while constructing Mongolian speech recognition systems, two acoustic model topology optimization algorithms for large modeling objects based on standard genetic algorithm and standard particle swarm optimization algorithm respectively are proposed in this dissertation, and the solutions to the implementation relevant issues of our algorithms and the strategy of training systems with non-uniformly allocated kernels are also presented. Compared with the previous similar applications of evolutionary methods in the field of speech recognition of other languages, the algorithms of this dissertation are both automatic search algorithms with the advantages of optimizing the number of model states and the number of kernels per state simultaneously and rejecting uniform allocation of Gaussian kernels. In the experiments to verify and apply the above-mentioned model topology optimization algorithms, the recognition performance of every speech recognition system whose model topologies of Mogolian large modeling objects are optimized using one of the two algorithms of this dissertation is much better than that of two groups of baselines whose model topologies of large modeling objects are selected using two conventional methods, respectively. Compared with the baseline with the best recognition performance, the increaments of word accuracy rates of two groups of model topology optimized systems, with each group built using one topology optimization algorithm of this dissertation, reached 11.52 percentage points and 10.42 percentage points, respectively.2. State clustering of Mongolian acoustic models:design of the question setIn the aspect of relevant supporting techniques, aiming at the current situation that it is an urgent task for Mongolian speech recognition researchers to design a reasonable and perfect Mongolian question set to support effectively decision tree based parameter tying method in acoustic modeling, solutions to some key issues (including design principle, selection of phone set, extension of phoneme tables, classification of diphthongs and application of slack tight concepts) in designing Mongolian question sets are discussed, a question set suitable for standard Mongolian is presented, then, the advantages of the question set over those Mongolian question sets that already exist are pointed out. In multiple comparative experiments between decision tree and data driven methods, the recognition rates brought by decision tree based state clustering method utilizing the question set of this dissertation are all a bit greater than those brought by data driven method, which indicates that our Mongolian question set is capable of supporting decision tree based state clustering and tying method effectively; and in the experiments of comparing Mongolian question sets, the performance of our Mongolian question set is superior to that of other Mongolian question sets that already exist, which means that the design of our Mongolian question set is more reasonable and more perfect.3. Discriminative training of Mongolian acoustic modelsIn the aspect of model parameter estimation, aiming at the current situation that mutual misrecognitions of Mogolian phones whose pronunciations are similar in oral Mongolian are seriously affecting the recognition performance of Mogolian speech recognition systems, this dissertation applied discriminative training techniques, which improve system recognition performance through increasing discrimination degree among models, in the field of Mongolian speech recognition for the first time. Specifically, This dissertation applied systematically such discriminative training criteria as maximum mutual information, minimum word error and minimum phone error in Mongolian acoustic modeling tasks, designed the application scheme, solved some technical issues, such as automatic adjustment of the list of training data, synthesizing models for triphones out of training corpus, automatic generation and processing of text files, proposed and implemented an experimental scheme, i.e., phone level decoding and cofusion matrix based comparison of decoded results, which enables the researchers to make direct observations on the effects of discriminative training, and discovered and drew some conclusions on some misrecognition laws of Mongolian phones in speech recognition process based on the confusion matrices of the discriminatively trained systems and their baselines. Experimental results indicate that each discriminative training criterion is capable of improving recognition performance of Mongolian speech recognition systems significantly, the increments of word accuracy rates brought by discriminative training reached 6.44 percentage points.4. Adaptive training of Mongolian acoustic modelsAgain in the aspect of model parameter estimation, aiming at the current situation that the recognition performance of speaker independent Mongolian speech recognition systems built using corpora of many speakers is not so satisfactory if used by a specific speaker, this dissertation applied speaker adaptive training techniques, which make system acoustic models more compatible with the characteristics of a specific speaker through model parameter re-estimation, in the field of Mongolian speech recognition for the first time. Specifically, on the basis of designing application scheme and solving such technical issues as setting leaf node numbers of regression class trees, this dissertation applied systematically maximum likelihood linear regression algorithms and maximum a posteriori algorithms in Mongolian acoustic modeling tasks. Moreover, we also explored further applications of speaker adaptive training algorithms:tested and verified through experiments the gender adaptation effects brought by speaker adaptive training algorithms and their effective combinations in Mongolian acoustic modeling, and also conducted some explorative research work on Mongolian dialect adaptation using speaker adaptive training algorithms. Experimental results indicate that each kind of parameter transformation algorithms of speaker adaptive training is capable of improving recognition performance of Mongolian speech recognition systems significantly, the increments of word accuracy rates brought by adaptive training reached 32.75 percentage points; speaker adaptive training can also have excellent gender adaptation effects in Mongolian acoustic modeling, the increments of word accuracy rates brought by gender adaptation reached 47.08 percentage points; under the circumstances of not being affected by gender adaptation, the algorithms and the combinations of algorithms that make transformations only on Gaussian mixture mean vectors of acoustic models possess the dialect adaptation abilities to bring considerable Ordos dialect oriented recognition performance improvements for standard Mongolian speech recognition systems, and the increments of word accuracy rates brought by them reached 7.67 percentage points.
Keywords/Search Tags:Mongolian, acoustic modeling, model topology, question set, discriminative training, adaptive training
PDF Full Text Request
Related items