Research On Mongolian Online Speech Recognition With Scarce Data Set

Posted on:2022-05-16

Degree:Master

Type:Thesis

Country:China

Candidate:J P Zhang

Full Text:PDF

GTID:2518306542977259

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

The end-to-end model has a simplified model,joint training,direct output,no need to force data alignment.The attention-based sequence-to-sequence model is an implementation of end-to-end speech recognition,and the model becomes a state-of-the-art ASR model on the English data set Librispeech.Our Mongolian online speech recognition research is based on the attention-based sequence-to-sequence model,which has the following problems:(1)The Mongolian data set is a low-resource data set,which means the labeled Mongolian audio is scarce and regional distribution is uneven.It will lead to over-fitting and poor generalization ability of the attention-based sequence-to-sequence model.(2)The first-word delay of the attention-based sequence-to-sequence model is too high to deploy on online speech recognition tasks.In response to these problems,the following work has been done on low-resource Mongolian online speech recognition research:(1)Given the scarcity of labeled Mongolian data and the uneven regional distribution,we proposed a generative adversarial network to generate specific area data.The model includes the conditional speech generator and the multi-phase fusion discriminator,which generate Mongolian audio with specified regional characteristics and Mongolian text.Experiments show that,compared to the original Mongolian data set,the generated data set can reduce word error rate from 5.1% to 3.5% on the attention-based sequence to sequence model.(2)Given the high first-word delay of the attention-based sequence to sequence model and the linear correspondence between Mongolian audio features and Mongolian text,we proposed a Window-size Control Online Speech Recognition model.The model implements a streaming Mongolian online recognition model through a latency control encoder and a stage aligner.Experiments show that our proposed model achieved 33.3% improvement on the first-word delay than the attention-based sequence to sequence model,while the word error rate and sentence error rate losses are only 6.4% and 8.5%,respectively.(3)System is required to deploy our proposed Window-size Control Online Speech Recognition model to the online environment.We proposed a Mongolian online speech recognition system,including Mongolian data collection,model training and online speech recognition.By deploying the Mongolian online speech recognition model to the online system,the feasibility and effectiveness of the Mongolian online speech recognition model are further demonstrated.

Keywords/Search Tags:

Online Speech Recognition, Attention Mechanism, Speech Augmentation, Low Resources Data Set, Mongolian

PDF Full Text Request

Related items

1	Research On Mongolian Speech Recognition Acoustic Model Based On Deep Learning
2	Research On Application Of Data Augmentation Based On Different Speech Habits In Speech Recognition In Telephone Scene
3	Application Research Of Attention-based End-to-end Speech Recognition
4	Research On Data Augmentation Technology For Speech Recognition Application
5	Speech Emotion Recognition Based On Neural Network And Attention Mechanism
6	Research On End-to-End Speech Recognition Method Based On Self-Attention Mechanism
7	Speech Emotion Recognition Based On Attention Mechanism
8	Research And Application Of Attention-based Mandarin Speech Recognition
9	Research On Speech Signal Recognition Based On Deep Two-way GRU And Attention Mechanism
10	Research On Speech Emotion Recognition Technology Based On Deep Learning