Font Size: a A A

Chineses Speech Recognition System Based On CLDNN Hybrid Model

Posted on:2022-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:L J WuFull Text:PDF
GTID:2518306524989349Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
At present,the Gaussian Mix-Model and Hidden Markov Model(GMM-HMM),which is more maturely used in the field of ASR,is easy to implement due to its simple structure and fast training speed in small data.However,as the corpus grows larger and the requirement of ASR's accuracy becomes higher,the state-based feature of GMMHMM makes it impossible to exhaust all text relationships,which leads to insufficient adaptability to data and poor recognition results.The model is trained by individual phoneme signals of speech with corresponding text,which requires forced alignment of corpus speech with text labels.The above drawbacks make traditional speech recognition models such as GMM-HMM increasingly unable to meet the needs of people.To solve the above problems,the thesis analyses Convolutional Long stort-term Deep Nerual Network(CLDNN),and presents three improvement schemes on the model.Firstly,unified alignment of the input signal length,to make each input include a whole voice and avoid alignment.Secondly,aim to strengthen the suitability of the model,the thesis uses deep CNN instead of original shallow CNN,which can extract more advanced features and be better adapted to the application scenario of Chinese ASR task.Thirdly,to simplify the model,the thesis uses BiGRU(Bi-directional Gated Recurrent Unit)to replace LSTM layers,configure two GRUs of reverse directions into a bidirectional GRU that can obtain contextual information within one moment to obtain the contextual temporal feedback capability.In order to verify the effectiveness of the three improved schemes,the thesis takes traditional CLDNN as the benchmark,compares it with the model with three improved schemes,and compares it with the traditional GMM-HMM which still needs to align labels,as well as the mainstream Deep CNN-CTC.The experimental result shows the superiority of the improved CLDNN-CTC in accuracy and non-alignment.Finally,the thesis builds a Chinese ASR system based with improved CLDNN.Users can use ASR on the browser and give feedback on the recognition results,as well as to add special words to the correction dictionary.The correction dictionary will help correct the recognition results and improve the recognition accuracy.In order to test the competition of the system,the thesis designs tests on the system and the result shows that correction dictionary and ASR can achieve the desired effect,with good robustness and generational performance.
Keywords/Search Tags:Convolutional Nerual Network(CNN), Bidirectional Gate Recurrent Unit(Bi GRU), Connectionist Temporal Classification (CTC), Statistical Language Model, Chinese automatic speech recognition system
PDF Full Text Request
Related items