Design Of End-to-end Ando Tibetan Speech Recognition System Based On Deep Learning

Posted on:2022-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:J Kang

Full Text:PDF

GTID:2518306482473214

Subject:Master of Engineering

Abstract/Summary:

Speech recognition technology for resource-rich languages such as Chinese,English,and French has achieved good recognition performance as a result of the rapid development of research technology on speech recognition at home and abroad.In particular,the rise of end-to-end technology has avoided the inherent defects of multiple modules and reduced the complexity of speech recognition models.However,due to the difficulty of constructing a corpus and the unique characteristics of the Tibetan language,progress in speech recognition technology has been slow,so it is critical to conduct research to improve the performance of Tibetan speech recognition systems.At present,Tibetan speech recognition faces many challenges:First,it does not have authoritative and open Tibetan corpus resources,and corpus construction is not easy;second,in terms of feature extraction,the current features can not well represent Tibetan speech;third,the Tibetan speech recognition model still has the problem of slow model training and low recognition rate.This research mainly designs an end-to-end Amdo Tibetan speech recognition system based on deep learning.The contributions to this research are as follows:(1)Constructed an Amdo Tibetan corpus for continuous speech recognition,containing 10 speakers and a total of 16,000 sentences;Perform data enhancement on the constructed corpus,and verify the role of data enhancement in the case of insufficient Tibetan data through experiments;(2)In the process of data preprocessing,Praat software was used to write scripts to realize endpoint detection,segmentation,annotation and other functions,and the spectrogram of the voice signal and the 40-dimensional Fbank feature were extracted as the input of the network model;(3)The combination of a recurrent neural network and a convolutional neural network will availably catch the temporal meaning information of speech as well as local spatial information in the frequency domain,allowing an acoustic model to be developed.On this basis,it introduced the Connectionist Temporal Classification,the attention mechanism and transfer learning thoughts,improved the performance of the model,and built an Amdo Tibetan speech recognition system based on a Web framework;The performance of the model using pre-training is better than that of the model trained from scratch,which improves the speed and reduces the hardware requirements,and demonstrates the feasibility and effectiveness of model migration training from the source language to the target language,with a word error rate of26.6%;in addition,experiments using the hybrid enhanced dataset resulted in a 1.7%performance improvement compared to the baseline.

Keywords/Search Tags:

Tibetan speech recognition, knowledge migration, end-to-end, connectionist temporal classification, data enhancement

Related items

1	Research On Tibetan Speech Recognition Based On Bidirectional Recurrent Neural Network
2	Research On Connectionist Temporal Classification In Speech Recognition
3	Amdo Tibetan Speech Recognition Based On Deep Neural Network
4	Research And Implementation Of End-to-End Long-term Speech Recognition Model Base On RNN-Transducer
5	Research And Implementation Of End-to-End Speech Recognition Algorithm
6	The Design And FPGA Verification Of End-to-end Mandarin Speech Recognition Based On CNN
7	Research On Speech Emotion Recognition Algorithm Based On Deep Learning
8	Research On Speech Enhancement And Recognition Of Tibetan Amdo Dialec
9	Research On CTC-based And Attention-based End-to-end Speech Recognition
10	Research On End-to-End Speech Recognition Method Based On Self-Attention Mechanism