Font Size: a A A

End-to-end Multi-accent Mandarin Speech Recognition

Posted on:2021-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:W YangFull Text:PDF
GTID:2518306497466804Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the artificial intelligence technology,intelligent devices have become standard in our work and life,and human-computer interaction technology has played an important role.Speech interactive technology has gradually replaced traditional methods such as touch,gestures,and keyboards.Speech interactive technology has become a new core technology of human-computer interaction technology.As the first step of speech interaction,speech recognition technology has made extraordinary achievements,but it also faces many problems.Accent problems in speech recognition technology mainly face the problem of lack of training data in small languages,difficult to identify in complex surroundings,and hard to optimize result in multilingual.End-to-End speech recognition system is the most popular speech recognition architecture in recent year.It has the characteristics of simple model,easy optimization,and good performance in multi-language recognition.Aiming at these issues of accent,this paper proposes to adopt an end-to-end speech recognition framework to solve Mandarin recognition with accent.The specific work is as follows:(1)In order to solve the problems of optimization of accented speech recognition tasks in traditional speech recognition systems,such as low recognition rate,and poor robustness to a variety of different accents,this paper builds a basic network of two end-to-end methods.By improving the decoding algorithm and the decoding network of attention mechanism,the decoding speed and recognition rate of the two end-to-end methods are improved.At the same time,when end-to-edn method is compared with the traditional method,the experiment proves that the end-to-end method has certain advancement in Mandarin recognition with accent.(2)In order to improve the recognition rate and training speed of the end-to-end frame on Mandarin speech recognition tasks with accents,first,the multi-head attention mechanism and CTC method are combined to build a mixed multi-head attention and CTC end-to-end model;secondly,in order to combine the advantages of two end-toend speech recognition methods more effectively,a multi-objective training and joint decoding method based on complete decoding is proposed.Finally,the training process and recognition rate are analyzed and compared on the public data set,which proves that the model based on multi-head attention mechanism and CTC proposed in this paper can effectively improve the performance of end-to-end Mandarin speech recognition with accents.(3)In order to improve the recognition rate of the mixed multi-headed attention and CTC in Mandarin with accents,the attention model of the encoder-decoder is improved,and the residual layer and the normalization layer are combined to achieve a deep multi-head attention encoder-decoder architecture.At the same time,in order to increase the model's ability to extract accents in deep networks,a language model was trained to improve the recognition rate.On the public data set,the model is analyzed by changing the number of encoder-decoder layers and the random inactivation rate.The experimental results show that the model in this paper effectively improves the recognition rate of speech recognition tasks with accents.
Keywords/Search Tags:accent, hybrid CTC/Attention end-to-end model, multi-head attention, connectionist temporal classification, speech recognition
PDF Full Text Request
Related items