Research For Continuous Speech Recognition Based On Deep Neural Networks

Posted on:2019-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:M H Li

Full Text:PDF

GTID:2428330548961222

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Speech recognition is a significant research branch of pattern recognition.It is also an interactive technology that people pay close attention to in the era of artificial intelligence.In the past 50 years,the traditional speech recognition technology has gradually stabilized,and with the extensive concern of the deep neural network theory in the early 21^st century,the speech recognition technology has also developed rapidly.From theoretical research to practical product application,various deep neural network models have achieved remarkable results in complex speech recognition tasks.The purpose of this thesis is to explore the applications of various deep neural network models in continuous speech recognition tasks about two main jobs.?1?.The method of acoustic feature extraction based on the structure of an auto-encoder was studied.According to the task of speech recognition in a complex noisy application environment such as multi-source interference,we proposed a stacked contractive-denoising auto-encoder model?CDAE?that has better anti-interference performance and representational capability for the extracted acoustic features.With the comparison experiment conducted on two standard corpuses,the depth of the network model and the effect of using different auto-encoder structures on the process of the feature extraction were respectively verified.The experimental results indicated that the stacked CDAE can extract the deeper features that can represent the speech signal through its own generalization ability,and it has an outstanding performance than other auto-encoder models with a 2%--4%increase in recognition accuracy.?2?.An end-to-end speech recognition process based on the recurrent neural network was studied and the CTC training criteria and the attention-mechanism training criteria were used respectively.On the basis of the bidirectional recurrent neural network,we established a model which contains a holistic sequence mapping training procedure from the acoustic features?input?to the different output units.In the comparison experiment with the traditional recognition system,the results of the 14 hours WSJ-corpus training and the 80 hours WSJ-corpus training illustrated that if the end-to-end speech recognition process was trained with limited resources,the advantages of the model are not prominent enough.However,as a whole,if a relatively adequate training corpus is available to assist in supporting or attached support such as linguistic models and other text corpora can be taken into consideration,a significant reduction in word error rate will be achieved.

Keywords/Search Tags:

Deep Neural Networks, Acoustic Model, Acoustic Feature, Auto-encoder, End-to-End Recognition

PDF Full Text Request

Related items

1	The Research Of Uyghur Acoustic Model Based On Deep Neural Network
2	Research On Speech Recognition Based On Convolutional Neural Networks
3	Uyghur Speech Recognition Based On Deep Recurrent Neural Network
4	Environment Sound Recognition Based On Deep Learning Methods
5	Research On Speech Recognition Method Based On Deep Learning
6	Reasearch Into Speech Recognition Based On Deep Learning
7	Research On The Compression Of Neural Networks Based Acoustic Models For Speech Recognition
8	Acoustic Model Of Chinese Speech Recognition Based On DNN
9	Research On Phone Feature Recognition Based On Deep Learning
10	Deep Auto-encoder Framework For SAR Images Change Detection