Font Size: a A A

Research For Continuous Speech Recognition Based On Deep Neural Networks

Posted on:2019-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:M H LiFull Text:PDF
GTID:2428330548961222Subject:Engineering
Abstract/Summary:PDF Full Text Request
Speech recognition is a significant research branch of pattern recognition.It is also an interactive technology that people pay close attention to in the era of artificial intelligence.In the past 50 years,the traditional speech recognition technology has gradually stabilized,and with the extensive concern of the deep neural network theory in the early 21st century,the speech recognition technology has also developed rapidly.From theoretical research to practical product application,various deep neural network models have achieved remarkable results in complex speech recognition tasks.The purpose of this thesis is to explore the applications of various deep neural network models in continuous speech recognition tasks about two main jobs.?1?.The method of acoustic feature extraction based on the structure of an auto-encoder was studied.According to the task of speech recognition in a complex noisy application environment such as multi-source interference,we proposed a stacked contractive-denoising auto-encoder model?CDAE?that has better anti-interference performance and representational capability for the extracted acoustic features.With the comparison experiment conducted on two standard corpuses,the depth of the network model and the effect of using different auto-encoder structures on the process of the feature extraction were respectively verified.The experimental results indicated that the stacked CDAE can extract the deeper features that can represent the speech signal through its own generalization ability,and it has an outstanding performance than other auto-encoder models with a 2%--4%increase in recognition accuracy.?2?.An end-to-end speech recognition process based on the recurrent neural network was studied and the CTC training criteria and the attention-mechanism training criteria were used respectively.On the basis of the bidirectional recurrent neural network,we established a model which contains a holistic sequence mapping training procedure from the acoustic features?input?to the different output units.In the comparison experiment with the traditional recognition system,the results of the 14 hours WSJ-corpus training and the 80 hours WSJ-corpus training illustrated that if the end-to-end speech recognition process was trained with limited resources,the advantages of the model are not prominent enough.However,as a whole,if a relatively adequate training corpus is available to assist in supporting or attached support such as linguistic models and other text corpora can be taken into consideration,a significant reduction in word error rate will be achieved.
Keywords/Search Tags:Deep Neural Networks, Acoustic Model, Acoustic Feature, Auto-encoder, End-to-End Recognition
PDF Full Text Request
Related items