Font Size: a A A

End-to-End Speech Recognition Based On Convolutional Neural Network And Gated Recurrent Unit

Posted on:2022-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y H JinFull Text:PDF
GTID:2518306737975979Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Natural Language Processing(NLP)is an indispensable research topic in the current computer development.With the emerging development of the hot artificial intelligence,more and more natural language processing methods have gradually penetrated into people's daily lives,such as image recognition,information processing,machine translation,sentiment analysis,text classification,language recognition,etc.The field has achieved good results in natural language processing,but the research on natural language processing is still a key topic now.People expect that the development of artificial intelligence can bring more convenience to people's work and life,and can liberate the limitations of the hands on the traditional mouse and keyboard through voice interaction,so that human-computer interaction can achieve a simpler and more convenient way of interaction.Speech recognition is part of the field of natural language processing.The research on speech recognition will have important research significance and practical value in actual work and life for a long period of time in the future.Based on the limitations of traditional speech recognition models such as complex training,assumptions,inability to align labels,etc.,according to the characteristics and advantages of different neural networks,combined with the image feature processing capabilities of convolutional neural networks and the acquisition of gated recurrent units Multi-information capabilities,using the two as a neural network for model construction.Due to the instability of the speech signal,mandatory label alignment is required.This paper uses an end-to-end speech recognition method to solve the problem of non-stationary label alignment by inserting blank placeholders into the sequence by connecting the timing classification criteria.On this basis,this paper constructs an acoustic model of CNN-GRU-CTC,and the model training results can achieve a certain degree of accuracy performance.
Keywords/Search Tags:natural Language processing, speech recognition, convolutional neural network, end-to-end
PDF Full Text Request
Related items