Font Size: a A A

Noise Robust Speech Recognition Research Based On Regression Deep Neural Network

Posted on:2017-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ShiFull Text:PDF
GTID:2308330485451785Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet era, voice recognition applications become increasingly popular. Due to its easy operation, voice interaction is gradually accepted by the public. However, during the speech recognition process, environmen-tal noise and channel diversity of different devices restricts large-scale application of automatic speech recognition system. In recent years,Deep Neural Network has been successfully applied to the automatic speech recognition system, which has better ro-bustness than the conventional method, but DNN in a noisy environment is still facing poor recognition perfomance. Furthermore, since the characteristics of DNN model, the traditional methods are difficult to use directly. To this end, this paper has done the following work:Firstly, we explore large-scale data based on 800 hours of training through the different regression neural network structure experiments, including input and output structure, DNN-Autoencoder structure. We compare the regression neural network per-formance and eventually find optimum structure for noisy speech recognition. Neural networks use powerful depth study of nonlinear structural mapping between noise and clean speech voice feature, then we will take processed speech feature for speech recog-nition. The best structure of DNN decrease the word error rate from 23.8%to 18.2%, which is 23.5% improvement.Secondly, this paper applied mixture density network(MDN) to denoising speech recognition for the first time. The targets are modeled by mixture Gussian distribus-tion. Optimized by maximum likelihood function, MDN has stronger fitting ability in contrast to DNN. Experiments show that, Contrast to DNN, MDN can bring word error rate is 5% improvement.Finally, we will apply DNN to far-field speech recognition and bandwidth expan-sion. Far-field speech always conclude convolution noise, In paper experiments showed DNN can bring a relative 55.5% improvement. If DNN with back-end acoustic model train at the same time while in turn can bring the WER relatively 4.86% improvement. DNN can also be applied in the field of bandwidth expansion, this article will character the relationship between 8k speech and 16k speech features, and the output can be used in speech recognition. Experiments show that contrast to 8k recognition system, the method enables recognition rate reduced performance in 5% or less within a tolerable range, but training resources can save half.
Keywords/Search Tags:Regression Neural Network, Denoising Speech Recognition, Mixture Den, sity Network, LVCSR, Bandwidth Expansion, Far-filed Seech Recognition
PDF Full Text Request
Related items