Font Size: a A A

Research On Amdo Tibetan Speech Recognition Technology Based On Deep Learning

Posted on:2022-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:T B SuanFull Text:PDF
GTID:2518306482473324Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Speech recognition is the most important research direction in human-computer interaction.It is the key to the connection between human and machine,and also the key to the development of the information society towards intelligence and automation.With the development of deep learning theory and technology,neural network speech recognition technology based on deep learning has gradually become a research hotspot.Compared with the traditional neural network,the neural network model based on deep learning can mine the effective time sequence information in the input features,and enhance the distinguishing performance and expression ability of features.At present,compared with the research on speech recognition technology in mainstream international languages,the research on Tibetan speech recognition technology is still in the development stage.By analyzing the phonemic features of Tibetan characters,this paper studies the Tibetan speech recognition technology based on deep learning.The main work content is as follows:(1)This paper analyzes the structure and spelling rules of Tibetan characters,as well as the phonemic characteristics of the basic components of Tibetan characters,and uses the Maximum Matching Algorithm to realize the conversion from Tibetan characters to corresponding international phonetic symbols.In order to combine the acoustic model with language model more effectively,a conversion strategy between wide-style transcription and strict-style transcription is proposed.Designed the Amdo Tibetan word-to-sound conversion system.(2)Based on deep learning,the acoustic model and language model of Tibetan speech recognition are designed respectively.Firstly,the feature dimension is reduced by convolution neural network of acoustic model,and the time series classification is connected as the loss function to realize the alignment and classification of Tibetan speech feature sequence and phonetic symbol sequence.Secondly,the transformation language model is used to encode and decode the phonetic sequence to Tibetan sentence.(3)The corpus of different modeling units is established and the speech dataset of the Lhasa dialect and the Amdo dialect is used as the train set of the acoustic model.The comparison experiment with the benchmark model verifies the effectiveness of the method in this paper.Experimental data shows that the Tibetan speech recognition system with deep neural network structure in this paper can achieve better results under the condition of about 114 hours of corpus.
Keywords/Search Tags:Amdo Tibetan, speech recognition, phonemic features, acoustic model, language model
PDF Full Text Request
Related items