Noise Robust Speech Recognition Research Based On Regression Deep Neural Network

Posted on:2017-05-19

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Shi

Full Text:PDF

GTID:2308330485451785

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

With the rapid development of mobile Internet era, voice recognition applications become increasingly popular. Due to its easy operation, voice interaction is gradually accepted by the public. However, during the speech recognition process, environmen-tal noise and channel diversity of different devices restricts large-scale application of automatic speech recognition system. In recent years,Deep Neural Network has been successfully applied to the automatic speech recognition system, which has better ro-bustness than the conventional method, but DNN in a noisy environment is still facing poor recognition perfomance. Furthermore, since the characteristics of DNN model, the traditional methods are difficult to use directly. To this end, this paper has done the following work:Firstly, we explore large-scale data based on 800 hours of training through the different regression neural network structure experiments, including input and output structure, DNN-Autoencoder structure. We compare the regression neural network per-formance and eventually find optimum structure for noisy speech recognition. Neural networks use powerful depth study of nonlinear structural mapping between noise and clean speech voice feature, then we will take processed speech feature for speech recog-nition. The best structure of DNN decrease the word error rate from 23.8%to 18.2%, which is 23.5% improvement.Secondly, this paper applied mixture density network(MDN) to denoising speech recognition for the first time. The targets are modeled by mixture Gussian distribus-tion. Optimized by maximum likelihood function, MDN has stronger fitting ability in contrast to DNN. Experiments show that, Contrast to DNN, MDN can bring word error rate is 5% improvement.Finally, we will apply DNN to far-field speech recognition and bandwidth expan-sion. Far-field speech always conclude convolution noise, In paper experiments showed DNN can bring a relative 55.5% improvement. If DNN with back-end acoustic model train at the same time while in turn can bring the WER relatively 4.86% improvement. DNN can also be applied in the field of bandwidth expansion, this article will character the relationship between 8k speech and 16k speech features, and the output can be used in speech recognition. Experiments show that contrast to 8k recognition system, the method enables recognition rate reduced performance in 5% or less within a tolerable range, but training resources can save half.

Keywords/Search Tags:

Regression Neural Network, Denoising Speech Recognition, Mixture Den, sity Network, LVCSR, Bandwidth Expansion, Far-filed Seech Recognition

PDF Full Text Request

Related items

1	Far-Filed Speech Recognition Methods Research Based On Beamforming And DNN
2	Research On Deep Learning Based Far-Filed Speech Recognition
3	Research On Speech Emotion Recognition Based On The Fusion Of ANN And GMM
4	Application Of Convolutional Neural Network In Large Vocabulary Continuous Speech Recognition
5	Research On And Implementation Of Continuous Speech Recognition System
6	Studying On Chinese Digital Speech Recognition Technology Based On Neural Network
7	Research On Speech Bandwidth Extension Using Deep Neural Network
8	Study Of Speech Recognition Algorithm Based On HMM And Neural Network
9	Research On Phone Feature Recognition Based On Deep Learning
10	Research On Chinese Speech Recognition And Emotion Recognition Based On Neural Network