End-to-end Multi-accent Mandarin Speech Recognition

Posted on:2021-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:W Yang

Full Text:PDF

GTID:2518306497466804

Subject:Software engineering

Abstract/Summary:

With the development of the artificial intelligence technology,intelligent devices have become standard in our work and life,and human-computer interaction technology has played an important role.Speech interactive technology has gradually replaced traditional methods such as touch,gestures,and keyboards.Speech interactive technology has become a new core technology of human-computer interaction technology.As the first step of speech interaction,speech recognition technology has made extraordinary achievements,but it also faces many problems.Accent problems in speech recognition technology mainly face the problem of lack of training data in small languages,difficult to identify in complex surroundings,and hard to optimize result in multilingual.End-to-End speech recognition system is the most popular speech recognition architecture in recent year.It has the characteristics of simple model,easy optimization,and good performance in multi-language recognition.Aiming at these issues of accent,this paper proposes to adopt an end-to-end speech recognition framework to solve Mandarin recognition with accent.The specific work is as follows:(1)In order to solve the problems of optimization of accented speech recognition tasks in traditional speech recognition systems,such as low recognition rate,and poor robustness to a variety of different accents,this paper builds a basic network of two end-to-end methods.By improving the decoding algorithm and the decoding network of attention mechanism,the decoding speed and recognition rate of the two end-to-end methods are improved.At the same time,when end-to-edn method is compared with the traditional method,the experiment proves that the end-to-end method has certain advancement in Mandarin recognition with accent.(2)In order to improve the recognition rate and training speed of the end-to-end frame on Mandarin speech recognition tasks with accents,first,the multi-head attention mechanism and CTC method are combined to build a mixed multi-head attention and CTC end-to-end model;secondly,in order to combine the advantages of two end-toend speech recognition methods more effectively,a multi-objective training and joint decoding method based on complete decoding is proposed.Finally,the training process and recognition rate are analyzed and compared on the public data set,which proves that the model based on multi-head attention mechanism and CTC proposed in this paper can effectively improve the performance of end-to-end Mandarin speech recognition with accents.(3)In order to improve the recognition rate of the mixed multi-headed attention and CTC in Mandarin with accents,the attention model of the encoder-decoder is improved,and the residual layer and the normalization layer are combined to achieve a deep multi-head attention encoder-decoder architecture.At the same time,in order to increase the model’s ability to extract accents in deep networks,a language model was trained to improve the recognition rate.On the public data set,the model is analyzed by changing the number of encoder-decoder layers and the random inactivation rate.The experimental results show that the model in this paper effectively improves the recognition rate of speech recognition tasks with accents.

Keywords/Search Tags:

accent, hybrid CTC/Attention end-to-end model, multi-head attention, connectionist temporal classification, speech recognition

Related items

1	Research On End-to-End Speech Recognition Method Based On Self-Attention Mechanism
2	Study On Attention Based Speech Emotion Recognition
3	Research On CTC-based And Attention-based End-to-end Speech Recognition
4	Research And Implementation Of End-to-End Speech Recognition Algorithm
5	Research On End-to-End Speech Recognition Based On GRU And Self-Attention Mechanism
6	Research And Implementation Of Multi-accent Mandarin Speech Recognition Based On Neural Network
7	Research On Speech Emotion Recognition Algorithm Based On Deep Learning
8	Research On Connectionist Temporal Classification In Speech Recognition
9	Research On Multi-accent Chinese Speech Recognition Approaches Based On Time Convolution Network
10	Chineses Speech Recognition System Based On CLDNN Hybrid Model