Font Size: a A A

Research Of The GMM-HMM Based Acoustic Models

Posted on:2017-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:W K WangFull Text:PDF
GTID:2308330503485286Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech recognition is to make the machine understand human speech through decoding the input of speech signal into computer instructions. Recent yeas, with the continuous improvement of the acoustic model and the linguistic model, the performance of the speech recognition system has been significantly improved. As the low-level module of the speech recognition system, the acoustic model is one of the most important parts in the system, which is the research emphasis in this paper. The main task of the acoustic model is to set templates for each basic unit of the extracted features, which constructs the benchmark for the decoder during the matching and the retrieval.The designed system in this paper is based on statistical methods, which contains the acoustic feature extraction, the acoustic model, the linguistic model and the decoder. The Mel Frequency Cepstral Coefficients(MFCC) is used as the main acoustic feature. The linguistic mode is based on n-Gram model. And the token passing algorithm is realized for the decoder. The acoustic model of the designed system is mainly based on the problem of training Gaussian Mixture Model and Hidden Markov Model(GMM-HMM). To train the statistical model is to iteratively upgrade the parameters of the model under the condition of maximum probability of generating the observed sequence. The parameters of the training model can effect the recognition and the decoder and will influence the performance of the system to some extent.The designed system in this paper trains the context-dependent three phoneme model using GMM to fit the probability distribution w hich is observed through the HMM. Clustering method based on decision tree is used to reduce the scale of the model. And the increased Gaussian components is adopted to improve the accuracy. The Expectation Maximization(EM) algorithm which is based on Maximum Likelihood(ML) estimation is utilized for training of the parameters. The final system is using WSJ0 dataset and implemented on HTK which has a recognition accuracy of 90.76% on testing corpus Nov92. In addition, the system is trained using a FIFO pipeline parallel model. The parallel model is implemented on the SGE platform which has a 7-14 parallel speed-up.
Keywords/Search Tags:Gaussian Mixture Model, Hidden Markov Model Speech, Acoustic Model, Parallel Model
PDF Full Text Request
Related items