Research On Amdo Tibetan Speech Recognition Technology Based On Deep Learning

Posted on:2022-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:T B Suan

Full Text:PDF

GTID:2518306482473324

Subject:Computer application technology

Abstract/Summary:

Speech recognition is the most important research direction in human-computer interaction.It is the key to the connection between human and machine,and also the key to the development of the information society towards intelligence and automation.With the development of deep learning theory and technology,neural network speech recognition technology based on deep learning has gradually become a research hotspot.Compared with the traditional neural network,the neural network model based on deep learning can mine the effective time sequence information in the input features,and enhance the distinguishing performance and expression ability of features.At present,compared with the research on speech recognition technology in mainstream international languages,the research on Tibetan speech recognition technology is still in the development stage.By analyzing the phonemic features of Tibetan characters,this paper studies the Tibetan speech recognition technology based on deep learning.The main work content is as follows:(1)This paper analyzes the structure and spelling rules of Tibetan characters,as well as the phonemic characteristics of the basic components of Tibetan characters,and uses the Maximum Matching Algorithm to realize the conversion from Tibetan characters to corresponding international phonetic symbols.In order to combine the acoustic model with language model more effectively,a conversion strategy between wide-style transcription and strict-style transcription is proposed.Designed the Amdo Tibetan word-to-sound conversion system.(2)Based on deep learning,the acoustic model and language model of Tibetan speech recognition are designed respectively.Firstly,the feature dimension is reduced by convolution neural network of acoustic model,and the time series classification is connected as the loss function to realize the alignment and classification of Tibetan speech feature sequence and phonetic symbol sequence.Secondly,the transformation language model is used to encode and decode the phonetic sequence to Tibetan sentence.(3)The corpus of different modeling units is established and the speech dataset of the Lhasa dialect and the Amdo dialect is used as the train set of the acoustic model.The comparison experiment with the benchmark model verifies the effectiveness of the method in this paper.Experimental data shows that the Tibetan speech recognition system with deep neural network structure in this paper can achieve better results under the condition of about 114 hours of corpus.

Keywords/Search Tags:

Amdo Tibetan, speech recognition, phonemic features, acoustic model, language model

Related items

1	Research On Amdo Tibetan Speech Recognition Technology Based On MRDCNN＿CTC＆Transformer Transformer
2	Research And System Realization Of Tibetan Continuous Speech Recognition Technology
3	Research On End-to-end Tibetan Speech Recognition Based On Deep Learnin
4	Amdo Tibetan Speech Recognition Based On Deep Neural Network
5	A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition
6	Research On Speech Enhancement And Recognition Of Tibetan Amdo Dialec
7	Research On Tibetan Language Model For Continuous Speech Recognition
8	Research On End-to-End Non-Autoregressive Model-Based Amdo Tibetan Speech Synthesis Technology
9	Research On Speech Synthesis Technology Of Amdo Tibetan Based On Seq2Seq＆WaveNet
10	The Research On Segmentation Acoustic Model Based On MPE Tibetan Lhasa Dialect