Font Size: a A A

Research On Mongolian Online Speech Recognition With Scarce Data Set

Posted on:2022-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:J P ZhangFull Text:PDF
GTID:2518306542977259Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The end-to-end model has a simplified model,joint training,direct output,no need to force data alignment.The attention-based sequence-to-sequence model is an implementation of end-to-end speech recognition,and the model becomes a state-of-the-art ASR model on the English data set Librispeech.Our Mongolian online speech recognition research is based on the attention-based sequence-to-sequence model,which has the following problems:(1)The Mongolian data set is a low-resource data set,which means the labeled Mongolian audio is scarce and regional distribution is uneven.It will lead to over-fitting and poor generalization ability of the attention-based sequence-to-sequence model.(2)The first-word delay of the attention-based sequence-to-sequence model is too high to deploy on online speech recognition tasks.In response to these problems,the following work has been done on low-resource Mongolian online speech recognition research:(1)Given the scarcity of labeled Mongolian data and the uneven regional distribution,we proposed a generative adversarial network to generate specific area data.The model includes the conditional speech generator and the multi-phase fusion discriminator,which generate Mongolian audio with specified regional characteristics and Mongolian text.Experiments show that,compared to the original Mongolian data set,the generated data set can reduce word error rate from 5.1% to 3.5% on the attention-based sequence to sequence model.(2)Given the high first-word delay of the attention-based sequence to sequence model and the linear correspondence between Mongolian audio features and Mongolian text,we proposed a Window-size Control Online Speech Recognition model.The model implements a streaming Mongolian online recognition model through a latency control encoder and a stage aligner.Experiments show that our proposed model achieved 33.3% improvement on the first-word delay than the attention-based sequence to sequence model,while the word error rate and sentence error rate losses are only 6.4% and 8.5%,respectively.(3)System is required to deploy our proposed Window-size Control Online Speech Recognition model to the online environment.We proposed a Mongolian online speech recognition system,including Mongolian data collection,model training and online speech recognition.By deploying the Mongolian online speech recognition model to the online system,the feasibility and effectiveness of the Mongolian online speech recognition model are further demonstrated.
Keywords/Search Tags:Online Speech Recognition, Attention Mechanism, Speech Augmentation, Low Resources Data Set, Mongolian
PDF Full Text Request
Related items