Research On Tibetan Lhasa Dialect Speech Recognition

Posted on:2020-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:G Zhou

Full Text:PDF

GTID:2428330572485994

Subject:Intelligent information processing

Abstract/Summary:

PDF Full Text Request

Speech recognition is an indispensable technology for realizing natural human-machine interfaces,and has been very successful in applications such as voice search.Tibetan is an important part of our country.Reasearch on Tibetan speech recognition technology can effectively solve the problem of language barrier between Tibetans and other ethnic groups in China.Besides,it can promote mutual exchanges and increase mutual understanding between different ethnic groups.The research of Tibetan speech recognition is also helpful to accelerate the development of Tibetan economy,science and technology,culture and other fields.It also can promote the development and progress of Tibetan and provide a better service for the Tibetan people.In order to realize the research of Tibetan Lhasa dialect continuous speech recognition,this thesis analyses the characteristics of Tibetan linguistics,and establishes a Tibetan Lhasa speech database.The Tibetan Lhasa speech acoustic models based on hidden Markov model(HMM)and deep neural networks(DNN)are established respectively.The Tibetan language model is trained using a text from the Tibetan text corpus and is based on word 3-grams.Furthermore,this thesis uses the end-to-end speech recognition technology realize the end-to-end speech recognition for Tibetan Lhasa speech.The experimental results show that the end-to-end Tibetan Lhasa speech recognition achieves good recognition performance.The main work and innovations of this thesis are as follows:Firstly,Tibetan Lhasa speech corpus is established.In this thesis,Tibetan text corpus invovles a total of 18000 sentences.The recordings involves 11 speakers,which involves 9 Tibetan female students and 2 Tibetan male students,all of the participants are young colleague students and fluent in Tibetan Lhasa.Each student performes a monologue based on the Tibetan text,with an average of 8 Tibetan characters per sentence.A total of 7584 Tibetan Lhasa speeches are recorded and the speech signals duration is about 12 hours.Secondly,acoustic models in Tibetan Lhasa based on HMM and DNN are established.The speaker-independent continuous speech recognition for Tibetan Lhasa is realized by combining the 3-gram language model trained using the Tibetan text corpus.Experimental results show that the Tibetan word error rate reaches 27.64% in the test set.Finally,This thesis improves the traditional hybrid connectionist temporal classification(CTC)/attention-based end-to-end speech recognition architecture.A hybrid architecture is realized by introducing dynamic adjustment parameters with linear interpolation between the CTC model and the attention-based model.The end-to-end Tibetan Lhasa speech recognition is realized.In the experiment,bidirectional long short-term memory projection(BLSTMP)is selected as the encoder networks.The hybrid CTC/Attention-based end-to-end architecture is used for network training and decoding.The 80 mel-scale filterbank coefficients alone with pitch features form a total of 83-dimensionals acounstic features per frame.Therefore,this thesis realizes the end-to-end recognition of Tibetan Lhasa speech.The experimental results show that the end-to-end Tibetan speech recognition achieves good recognition performance.

Keywords/Search Tags:

Tibetan Speech Recognition, DNN, CTC, Hybrid CTC/Attention, End to End Speech Recognition

PDF Full Text Request

Related items

1	Research On Audio And Video Speech Recognition In Tibetan Lhasa Dialect
2	Research On Tibetan Speech Recognition Based On Speech Spectral Features
3	Research On Tibetan Non-specific Continuous Speech Recognition Based On Deep Learning
4	Research On Online Tibetan Speech Recognition System
5	Design Of End-to-end Ando Tibetan Speech Recognition System Based On Deep Learning
6	Technology Of Tibetan Speech Recognition Based On Fast Walsh Transform
7	Research And System Realization Of Tibetan Continuous Speech Recognition Technology
8	Research On Tibetan Speech Recognition Based On CNN Multi-feature Fusion
9	Research On Tibetan Speech Emotion Recognition Method
10	A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition