Font Size: a A A

Research On Tibetan Lhasa Dialect Speech Recognition

Posted on:2020-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:G ZhouFull Text:PDF
GTID:2428330572485994Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
Speech recognition is an indispensable technology for realizing natural human-machine interfaces,and has been very successful in applications such as voice search.Tibetan is an important part of our country.Reasearch on Tibetan speech recognition technology can effectively solve the problem of language barrier between Tibetans and other ethnic groups in China.Besides,it can promote mutual exchanges and increase mutual understanding between different ethnic groups.The research of Tibetan speech recognition is also helpful to accelerate the development of Tibetan economy,science and technology,culture and other fields.It also can promote the development and progress of Tibetan and provide a better service for the Tibetan people.In order to realize the research of Tibetan Lhasa dialect continuous speech recognition,this thesis analyses the characteristics of Tibetan linguistics,and establishes a Tibetan Lhasa speech database.The Tibetan Lhasa speech acoustic models based on hidden Markov model(HMM)and deep neural networks(DNN)are established respectively.The Tibetan language model is trained using a text from the Tibetan text corpus and is based on word 3-grams.Furthermore,this thesis uses the end-to-end speech recognition technology realize the end-to-end speech recognition for Tibetan Lhasa speech.The experimental results show that the end-to-end Tibetan Lhasa speech recognition achieves good recognition performance.The main work and innovations of this thesis are as follows:Firstly,Tibetan Lhasa speech corpus is established.In this thesis,Tibetan text corpus invovles a total of 18000 sentences.The recordings involves 11 speakers,which involves 9 Tibetan female students and 2 Tibetan male students,all of the participants are young colleague students and fluent in Tibetan Lhasa.Each student performes a monologue based on the Tibetan text,with an average of 8 Tibetan characters per sentence.A total of 7584 Tibetan Lhasa speeches are recorded and the speech signals duration is about 12 hours.Secondly,acoustic models in Tibetan Lhasa based on HMM and DNN are established.The speaker-independent continuous speech recognition for Tibetan Lhasa is realized by combining the 3-gram language model trained using the Tibetan text corpus.Experimental results show that the Tibetan word error rate reaches 27.64% in the test set.Finally,This thesis improves the traditional hybrid connectionist temporal classification(CTC)/attention-based end-to-end speech recognition architecture.A hybrid architecture is realized by introducing dynamic adjustment parameters with linear interpolation between the CTC model and the attention-based model.The end-to-end Tibetan Lhasa speech recognition is realized.In the experiment,bidirectional long short-term memory projection(BLSTMP)is selected as the encoder networks.The hybrid CTC/Attention-based end-to-end architecture is used for network training and decoding.The 80 mel-scale filterbank coefficients alone with pitch features form a total of 83-dimensionals acounstic features per frame.Therefore,this thesis realizes the end-to-end recognition of Tibetan Lhasa speech.The experimental results show that the end-to-end Tibetan speech recognition achieves good recognition performance.
Keywords/Search Tags:Tibetan Speech Recognition, DNN, CTC, Hybrid CTC/Attention, End to End Speech Recognition
PDF Full Text Request
Related items