Font Size: a A A

Design Of End-to-end Ando Tibetan Speech Recognition System Based On Deep Learning

Posted on:2022-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:J KangFull Text:PDF
GTID:2518306482473214Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Speech recognition technology for resource-rich languages such as Chinese,English,and French has achieved good recognition performance as a result of the rapid development of research technology on speech recognition at home and abroad.In particular,the rise of end-to-end technology has avoided the inherent defects of multiple modules and reduced the complexity of speech recognition models.However,due to the difficulty of constructing a corpus and the unique characteristics of the Tibetan language,progress in speech recognition technology has been slow,so it is critical to conduct research to improve the performance of Tibetan speech recognition systems.At present,Tibetan speech recognition faces many challenges:First,it does not have authoritative and open Tibetan corpus resources,and corpus construction is not easy;second,in terms of feature extraction,the current features can not well represent Tibetan speech;third,the Tibetan speech recognition model still has the problem of slow model training and low recognition rate.This research mainly designs an end-to-end Amdo Tibetan speech recognition system based on deep learning.The contributions to this research are as follows:(1)Constructed an Amdo Tibetan corpus for continuous speech recognition,containing 10 speakers and a total of 16,000 sentences;Perform data enhancement on the constructed corpus,and verify the role of data enhancement in the case of insufficient Tibetan data through experiments;(2)In the process of data preprocessing,Praat software was used to write scripts to realize endpoint detection,segmentation,annotation and other functions,and the spectrogram of the voice signal and the 40-dimensional Fbank feature were extracted as the input of the network model;(3)The combination of a recurrent neural network and a convolutional neural network will availably catch the temporal meaning information of speech as well as local spatial information in the frequency domain,allowing an acoustic model to be developed.On this basis,it introduced the Connectionist Temporal Classification,the attention mechanism and transfer learning thoughts,improved the performance of the model,and built an Amdo Tibetan speech recognition system based on a Web framework;The performance of the model using pre-training is better than that of the model trained from scratch,which improves the speed and reduces the hardware requirements,and demonstrates the feasibility and effectiveness of model migration training from the source language to the target language,with a word error rate of26.6%;in addition,experiments using the hybrid enhanced dataset resulted in a 1.7%performance improvement compared to the baseline.
Keywords/Search Tags:Tibetan speech recognition, knowledge migration, end-to-end, connectionist temporal classification, data enhancement
PDF Full Text Request
Related items