Font Size: a A A

Research And Application Of Attention-based Mandarin Speech Recognition

Posted on:2021-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZongFull Text:PDF
GTID:2518306107968759Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Attention-based end-to-end speech recognition has attracted widespread interest due to the successful application of attention mechanisms in natural language processing.Most of the existing attention-based model researches focus on English speech recognition.Previous attempts have shown that the reason why attention-based models struggle to converge with Mandarin data is the use of orthography in Mandarin and the limited information provided by Chinese characters on spoken sounds.Also,attention-based models have conditional dependence.To this end,the application of attention-based model for Mandarin is investigated and a subtitle generation system based on speech recognition is designed and implemented.Improving accuracy of Mandarin speech recognition by refining the multi-stream self-attention model that achieves the state-of-the-art result in English,specifically by proposing a splicing-block replacement for the convolution-block in the original model.The splicing-block constructed using "3-stage splicing" method is stacked to increase model depth,while reducing model complexity by factorization and enhancing feature delivery by skip connection.The refined model consists of parallel self-attention encoder streams,each of which extracts features using convolution layers with the same and specific dilation rate,which is subsequently fed into the attention layer.During training,the model directly outputs Chinese characters and adopts both L2 regularization and Gaussian weight noise optimization techniques.The subtitle generation system is based on a trained model that recognizes user-supplied audio data as Chinese text,which is then translated and generate bilingual subtitle files.In the experimental analysis part,the feasibility of the multi-stream self-attention model for Mandarin speech recognition is verified using the AISHELL-1 corpus.An experimental analysis of model-related parameters is performed and the optimal configuration is summarized,and a character error rate of 16.20% is achieved with language model.The model using splicing-block has an 11.1% character error rate reduction relative to using convolution-block.
Keywords/Search Tags:automatic speech recognition, deep learning, end-to-end, attention mechanism
PDF Full Text Request
Related items