Research And Application Of Attention-based Mandarin Speech Recognition

Posted on:2021-08-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Zong

Full Text:PDF

GTID:2518306107968759

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Attention-based end-to-end speech recognition has attracted widespread interest due to the successful application of attention mechanisms in natural language processing.Most of the existing attention-based model researches focus on English speech recognition.Previous attempts have shown that the reason why attention-based models struggle to converge with Mandarin data is the use of orthography in Mandarin and the limited information provided by Chinese characters on spoken sounds.Also,attention-based models have conditional dependence.To this end,the application of attention-based model for Mandarin is investigated and a subtitle generation system based on speech recognition is designed and implemented.Improving accuracy of Mandarin speech recognition by refining the multi-stream self-attention model that achieves the state-of-the-art result in English,specifically by proposing a splicing-block replacement for the convolution-block in the original model.The splicing-block constructed using "3-stage splicing" method is stacked to increase model depth,while reducing model complexity by factorization and enhancing feature delivery by skip connection.The refined model consists of parallel self-attention encoder streams,each of which extracts features using convolution layers with the same and specific dilation rate,which is subsequently fed into the attention layer.During training,the model directly outputs Chinese characters and adopts both L2 regularization and Gaussian weight noise optimization techniques.The subtitle generation system is based on a trained model that recognizes user-supplied audio data as Chinese text,which is then translated and generate bilingual subtitle files.In the experimental analysis part,the feasibility of the multi-stream self-attention model for Mandarin speech recognition is verified using the AISHELL-1 corpus.An experimental analysis of model-related parameters is performed and the optimal configuration is summarized,and a character error rate of 16.20% is achieved with language model.The model using splicing-block has an 11.1% character error rate reduction relative to using convolution-block.

Keywords/Search Tags:

automatic speech recognition, deep learning, end-to-end, attention mechanism

PDF Full Text Request

Related items

1	Research On End-to-End Speech Recognition Method Based On Self-Attention Mechanism
2	Research On Several Modeling Problems In Deep Learning Speech Recognition Systems
3	Study On Attention Based Speech Emotion Recognition
4	Speech Emotion Recognition Based On Deep Learning Technology
5	Design And Implementation Of Speech Emotion Recognition Algorithm Based On Deep Learning
6	Research On CTC-based And Attention-based End-to-end Speech Recognition
7	Research On Speech Emotion Recognition Method Based On Time Series Deep Learning Model
8	Research On Speech Spoofing Detection Based On Attention Mechanism And End-to-End Model
9	Research On Adaptation Methods In Deep Learning Based Speech Recognition Systems
10	Research On Speech Emotion Recognition Technology Based On Deep Learning