Asr Research Based On CTC

Posted on:2020-05-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y P Jiang

Full Text:PDF

GTID:2428330590454870

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Speech recognition is called the Pearl on the crown of artificial intelligence.In order to achieve the goal of natural voice communication between human and machine.For a long time,the Gauss Mixture Model-Hidden Markov Model(GMM-HMM)has been the mainstream model in acoustic modeling.However,with the expansion of application scenarios,traditional models are increasingly unable to meet people's performance needs.Especially when it is used in small language recognition tasks such as Uygur language,it is more embarrassing.Connectionist Temporal Classification(CTC)is the mainstream method in the current industry.This method can solve the problem of excessive time and space consumption of alignment operation in the process of acoustic model training.Different from the long short-term memory(LSTM)network used in the past,LSTM network is used for acoustic model training.Considering the large number of CTC parameters and the high computational cost of LSTM,it is difficult to train the CTC model adequately.Inspired by the model compression ability of deep feed-forward sequence memory network(DFSMN),this paper adopts the combination of DFSMN and CTC to train the acoustic model.The experimental results show that under the training of context phoneme correlation and cross-entropy(CE)criteria,compared with LSTM network model,the new model achieves 11% performance improvement.At the same time,in order to reduce excessive features,this paper conducts quantitative experiments on the contribution of different phonetic and acoustic characteristics of the same language under various tasks.Through experiments,the effects of five main acoustic characteristics on different recognition tasks are obtained.The results show that pronunciation duration has the greatest contribution in both dialect recognition task and gender recognition task,while phoneme energy has a relatively small contribution.This is in line with people's intuition.

Keywords/Search Tags:

speech recognition, acoustic model, language model, hidden Markov model, Connectionist Temporal Classification, feature contribution

PDF Full Text Request

Related items

1	Research On Connectionist Temporal Classification In Speech Recognition
2	Construction And Experiment Of Acoustic Model Based On CNN
3	Study And Improve On The Mongolian Speech Recognition System
4	Research And Application Of Deep Learning Based Continuous Speech Recognition
5	A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition
6	Chineses Speech Recognition System Based On CLDNN Hybrid Model
7	Research On Chinese Continuous Speech Recognition In Noisy Environment
8	Researching Of The Mongolian Acoustic Model Based On Speech Recognition
9	Research And System Realization Of Tibetan Continuous Speech Recognition Technology
10	Research Of Speech Recognition Based On Mixture Feature Extraction And Improved Continuous Hidden Markov Model