Font Size: a A A

Research On Connectionist Temporal Classification In Speech Recognition

Posted on:2020-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y M WangFull Text:PDF
GTID:2428330572489665Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of research in artificial intelligence and the continuing accumulation of big data corpus,speech recognition has rapidly developed now.Neural network has been extensively applied to speech recognition technology,end-to-end speech recognition has recently become a hot topic in artificial intelligence research.However,due to the complexity of its real application scenarios and speaker pronunciation characteristics,the end-to-end speech recognition model for Chinese gets relatively low accuracy.Aiming at the above problems,we take the Chinese pronunciation characteristics into consideration to optimize and improve the current mainstream end-to-end speech recognition model structure,which is aimed to improve the recognition performance and training efficiency of the end-to-end speech recognition framework for Chinese.Firstly,we design a baseline experiment based on method which combines Hidden Markov Model(HMM)-Gaussian Mixture Model(GMM)acoustic model,lexicon and N-gram language model.In the study of the GMM-HMM model,aiming at the problem of susceptibility of speech signals to context,we consider the front and back phoneme of current phoneme while building tri-phone acoustic model.Considering the influence of speaking style of different speakers,we adopt speaker adaptation technologies in GMM-HMM modeling to increase the recognition accuracy of baseline experiment.Then,aimed at the low accuracy of end-to-end framework applied to Chinese,we use incomplete end-to-end structure and apply this structure to speech recognition research of neural network time series classification method.In our research,because the LSTM-CTC end-to-end model have drawbacks,such as high computational complexity and long training time,we propose an improved model,i.e.Projection Long Short-term Memory(PLSTM)to speed up the model training.Because the long-term dependence of speech is not only in forward direction,in this work we use bidirectional Long Short-term Memory(Bi-LSTM)instead of LSTM or RNN combined with Connectionist Temporal Classification(CTC),which can help improve the accuracy.Finally,We started our experiment on the speech database of AISHELL,we use speed-perturbed training data to avoid overfitting while training Bi-LSTM.In the final experiment results,compared with the baseline experimental results,the accuracy and the speed of the model are all significantly improved.
Keywords/Search Tags:Hidden Markov and Gaussian Mixture Model, Connectionist Temporal Classification, Prejected Long Short-Term Memory Network, Bidirectional Neural Network, Speed Perturbation
PDF Full Text Request
Related items