Research On Connectionist Temporal Classification In Speech Recognition

Posted on:2020-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Wang

Full Text:PDF

GTID:2428330572489665

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of research in artificial intelligence and the continuing accumulation of big data corpus,speech recognition has rapidly developed now.Neural network has been extensively applied to speech recognition technology,end-to-end speech recognition has recently become a hot topic in artificial intelligence research.However,due to the complexity of its real application scenarios and speaker pronunciation characteristics,the end-to-end speech recognition model for Chinese gets relatively low accuracy.Aiming at the above problems,we take the Chinese pronunciation characteristics into consideration to optimize and improve the current mainstream end-to-end speech recognition model structure,which is aimed to improve the recognition performance and training efficiency of the end-to-end speech recognition framework for Chinese.Firstly,we design a baseline experiment based on method which combines Hidden Markov Model(HMM)-Gaussian Mixture Model(GMM)acoustic model,lexicon and N-gram language model.In the study of the GMM-HMM model,aiming at the problem of susceptibility of speech signals to context,we consider the front and back phoneme of current phoneme while building tri-phone acoustic model.Considering the influence of speaking style of different speakers,we adopt speaker adaptation technologies in GMM-HMM modeling to increase the recognition accuracy of baseline experiment.Then,aimed at the low accuracy of end-to-end framework applied to Chinese,we use incomplete end-to-end structure and apply this structure to speech recognition research of neural network time series classification method.In our research,because the LSTM-CTC end-to-end model have drawbacks,such as high computational complexity and long training time,we propose an improved model,i.e.Projection Long Short-term Memory(PLSTM)to speed up the model training.Because the long-term dependence of speech is not only in forward direction,in this work we use bidirectional Long Short-term Memory(Bi-LSTM)instead of LSTM or RNN combined with Connectionist Temporal Classification(CTC),which can help improve the accuracy.Finally,We started our experiment on the speech database of AISHELL,we use speed-perturbed training data to avoid overfitting while training Bi-LSTM.In the final experiment results,compared with the baseline experimental results,the accuracy and the speed of the model are all significantly improved.

Keywords/Search Tags:

Hidden Markov and Gaussian Mixture Model, Connectionist Temporal Classification, Prejected Long Short-Term Memory Network, Bidirectional Neural Network, Speed Perturbation

PDF Full Text Request

Related items

1	Research On Temporal Action Detection In Video
2	Amdo Tibetan Speech Recognition Based On Deep Neural Network
3	Research On Short Text Classification Method Based On Contextual Feature Expression
4	Long Short Term Memory Recurrent Neural Network Application To Handwritten Recognition
5	Phishing Websites Detection Using Selected Features Classification And Bidirectional Long Short-Term Memory Neural Networks
6	Research On Relation Classification Via Bidirectional Long Short-Term Memory Networks With Attention Mechanism
7	Research On Chinese Text Classification Method Based On Long And Short Term Memory Network
8	Research On Network Intrusion Detection Method Based On Bi-LSTM
9	Research On End-to-End Speech Recognition Based On GRU And Self-Attention Mechanism
10	Research On Text Classification Based On Deep Learning