Font Size: a A A

Acoustic Modeling For Continuous Speech Recognition

Posted on:2003-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:L XieFull Text:PDF
GTID:2168360092966185Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Speech Recognition is a fast growing technique these years. To let the computers understand human speech and even communicate with human beings are dreams of us. Maybe in the near future, these dreams could come true. The main focus on current speech recognition technology is Continuous Speech Recognition (CSR). This thesis is a part of bilateral project between China and Belgium, which is named "Machine Vision and Speech Technology of Augmented Reality". The first target of the project is to drive and generate a Talking Head using speech recognition results. To set up a CSR recognizer is brought forward. The work to build Acoustic Models of the recognizer is the essential part of the thesis.First of all, a brief introduction on the theory of Hidden Markov Model (HMM), which is frequently used in CSR, is presented. In order to put HMM into practical speech recognition applications, three important problems have to be solved. These problems are analyzed in details. The end of this part introduces the concept of Continuous Density HMM and the types of HMM.Then comes the whole structure of our recognizer, including speech acoustic analyzing, acoustic modeling and recognition strategy. Detail analyses are focused on building acoustic HMM models, which contains the basic acoustic model selection, the English phoneme set, and the training method of HMM parameters. The embedded training algorithm is highlighted in this part.After the construction of basic HMM models, I take efforts to modify and refine the HMM models because of the poor recognition performance of those basic models. Firstly, I introduce a kind of Context Dependent HMM model桾riphone, which cares about the neighboring context information of each phoneme. Then refinements of these context dependent models, such as mixture incrementing, parameter tying, are fulfilled. I mainly research on the state tying of triphone models. A Decision-Tree based Top-Down strategy is illustrated, which uses acoustic-linguistic questions to classify model states.The last part of the thesis shows several recognition experiments based on the upper research. The results conclude that Context Dependent models could improve the recognizer performance on a quite large extent, and Decision-Tree based parameter tying technique could reach a dynamic balance between speech corpus containing training data and model accuracy. Future work is narrated at the end of this thesis.The majority work of this thesis was done in ETRO department of Vrije Universitiet of Brussel (VUB), Belgium.
Keywords/Search Tags:Continuous Speech Recognition (CSR), Acoustic Model, Hidden Markov Model (HMM), Embedded Training Algorithm, Context Dependent, Decision-Tree based State Tying
PDF Full Text Request
Related items