Font Size: a A A

Research On Tibetan Non-specific Continuous Speech Recognition Based On Deep Learning

Posted on:2018-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhouFull Text:PDF
GTID:2358330542984070Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
GMM-HMM acoustic model in speech recognition technology has made a success,but with the increasing amount of speech data,the model parameters increase,which results in that the training model parameters is not sufficient.So the rate of speech recognition based GMM-HMM is affected by the above factors.Given the context of large data,because deep learning can have powerful modeling capabilities for mass data,it has been widely used in various fields of pattern recognition.In recent years,with the development of the deep neural network(DNN),the features extracted by DNN is highly robust and semantic distinguish,and its effectiveness has been verified in the task of speech recognition for English and Chinese languages,but the study on deep learning based Tibetan speaker-independent continuous speech recognition has not been done deeply.Therefore,this paper mainly discusses the application of deep neural networks and the deep bottleneck neural network in the continuous speech recognition tasks.1.Research on continuous speech recognition of Tibetan Lhasa dialect based on GMM-HMMSpeech recognition acoustic model based on GMM-HMM used MFCC features,and has complete theories and high training efficiency.In this paper,the acoustic model and 3-gram model based on GMM-HMM are implemented on HTK platform.The correct rate is 82.90%,and the accuracy rate is 79.35%.We analyzed the recognition rate for the different Gauss mixing degrees.For a certain amount of training data,with the increasing of Gauss mixing degree,the recognition rate will increase,but to the certain mount,because of data sparsity,the recognition rate will decline.2.Research on continuous speech recognition of Tibetan Lhasa dialect based on DNN-HMMSince the GMM-HMM model is modeled by MFCC,the MFCC feature of each frame usually contains only millisecond speech signal,which is not enough information and is easy to be affected by noise,and its noise immunity is weak.This paper mainly studied on the network structure of the deep neural network,pre-training,parameter adjustment and so on,and built the deep neural network for speech feature extraction in Tibetan speech recognition system on Kaldi platform.The features of output layer of the deep neural network is used to train the acoustic model based on HMM.Its correct rate of single syllable is 85.39%,the accuracy rate is 84.68%.3.Research on continuous speech recognition of Tibetan Lhasa dialect based on deep bottleneck featureBecause the posterior features of DNN can not be used in the mature and efficient performance of GMM-HMM framework.The deep neural network with a narrow bottleneck can solve this problem.The bottleneck feature not only has the long term context-depedence and compact representation of speech signal,but also can replace the traditional MFCC features for GMM-HMM acoustic modeling.Based on this idea,this paper studies the Tibetan continuous speech recognition technology based on bottleneck features and the concatenated features of bottleneck features and MFCC features.The experimental results show that the concatenated features of bottleneck features and MFCC features have the best representation for speech signal,and have the highest recognition rate.Its correct rate of single syllable is 86.44%,the accuracy rate is 85.80%4.The online speech recognition system for Lhasa-TibetanOn the Kaldi speech recognition system platform,we built the online speech recognition system for Lhasa-Tibetan based on the deep bottleneck features.The system can input the Tibetan speech through microphone,and automatically uses the trained acoustic model,language model and the dictionary file to recognize the content of speech signal.It displayed the Tibetan characters on the console.
Keywords/Search Tags:Lhasa-Tibetan speaker-independent continuous speech recognition, GMM-HMM, DNN-HMM, bottleneck feature, online speech recognition system
PDF Full Text Request
Related items