Font Size: a A A

Asr Research Based On CTC

Posted on:2020-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y P JiangFull Text:PDF
GTID:2428330590454870Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speech recognition is called the Pearl on the crown of artificial intelligence.In order to achieve the goal of natural voice communication between human and machine.For a long time,the Gauss Mixture Model-Hidden Markov Model(GMM-HMM)has been the mainstream model in acoustic modeling.However,with the expansion of application scenarios,traditional models are increasingly unable to meet people's performance needs.Especially when it is used in small language recognition tasks such as Uygur language,it is more embarrassing.Connectionist Temporal Classification(CTC)is the mainstream method in the current industry.This method can solve the problem of excessive time and space consumption of alignment operation in the process of acoustic model training.Different from the long short-term memory(LSTM)network used in the past,LSTM network is used for acoustic model training.Considering the large number of CTC parameters and the high computational cost of LSTM,it is difficult to train the CTC model adequately.Inspired by the model compression ability of deep feed-forward sequence memory network(DFSMN),this paper adopts the combination of DFSMN and CTC to train the acoustic model.The experimental results show that under the training of context phoneme correlation and cross-entropy(CE)criteria,compared with LSTM network model,the new model achieves 11% performance improvement.At the same time,in order to reduce excessive features,this paper conducts quantitative experiments on the contribution of different phonetic and acoustic characteristics of the same language under various tasks.Through experiments,the effects of five main acoustic characteristics on different recognition tasks are obtained.The results show that pronunciation duration has the greatest contribution in both dialect recognition task and gender recognition task,while phoneme energy has a relatively small contribution.This is in line with people's intuition.
Keywords/Search Tags:speech recognition, acoustic model, language model, hidden Markov model, Connectionist Temporal Classification, feature contribution
PDF Full Text Request
Related items