Chineses Speech Recognition System Based On CLDNN Hybrid Model

Posted on:2022-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:L J Wu

Full Text:PDF

GTID:2518306524989349

Subject:Master of Engineering

Abstract/Summary:

At present,the Gaussian Mix-Model and Hidden Markov Model(GMM-HMM),which is more maturely used in the field of ASR,is easy to implement due to its simple structure and fast training speed in small data.However,as the corpus grows larger and the requirement of ASR’s accuracy becomes higher,the state-based feature of GMMHMM makes it impossible to exhaust all text relationships,which leads to insufficient adaptability to data and poor recognition results.The model is trained by individual phoneme signals of speech with corresponding text,which requires forced alignment of corpus speech with text labels.The above drawbacks make traditional speech recognition models such as GMM-HMM increasingly unable to meet the needs of people.To solve the above problems,the thesis analyses Convolutional Long stort-term Deep Nerual Network(CLDNN),and presents three improvement schemes on the model.Firstly,unified alignment of the input signal length,to make each input include a whole voice and avoid alignment.Secondly,aim to strengthen the suitability of the model,the thesis uses deep CNN instead of original shallow CNN,which can extract more advanced features and be better adapted to the application scenario of Chinese ASR task.Thirdly,to simplify the model,the thesis uses BiGRU(Bi-directional Gated Recurrent Unit)to replace LSTM layers,configure two GRUs of reverse directions into a bidirectional GRU that can obtain contextual information within one moment to obtain the contextual temporal feedback capability.In order to verify the effectiveness of the three improved schemes,the thesis takes traditional CLDNN as the benchmark,compares it with the model with three improved schemes,and compares it with the traditional GMM-HMM which still needs to align labels,as well as the mainstream Deep CNN-CTC.The experimental result shows the superiority of the improved CLDNN-CTC in accuracy and non-alignment.Finally,the thesis builds a Chinese ASR system based with improved CLDNN.Users can use ASR on the browser and give feedback on the recognition results,as well as to add special words to the correction dictionary.The correction dictionary will help correct the recognition results and improve the recognition accuracy.In order to test the competition of the system,the thesis designs tests on the system and the result shows that correction dictionary and ASR can achieve the desired effect,with good robustness and generational performance.

Keywords/Search Tags:

Convolutional Nerual Network(CNN), Bidirectional Gate Recurrent Unit(Bi GRU), Connectionist Temporal Classification (CTC), Statistical Language Model, Chinese automatic speech recognition system

Related items

1	Research And Implementation Of End-to-End Speech Recognition Algorithm
2	Research On Tibetan Speech Recognition Based On Bidirectional Recurrent Neural Network
3	Research On Connectionist Temporal Classification In Speech Recognition
4	Research And Implementation Of Speech Recognition Algorithm Based On Recurrent Neural Network
5	Research On End-to-End Speech Recognition Based On GRU And Self-Attention Mechanism
6	Research On Speech Emotion Recognition Algorithm Based On Deep Learning
7	Research On End-to-end Speech Recognition Based On Convolutional Neural Networks
8	Research And Implementation Of End-to-End Long-term Speech Recognition Model Base On RNN-Transducer
9	Asr Research Based On CTC
10	The Design And FPGA Verification Of End-to-end Mandarin Speech Recognition Based On CNN