Font Size: a A A

Very Deep Convolutional Neural Networks For Noise Robust Speech Recognition

Posted on:2018-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:M X BiFull Text:PDF
GTID:2428330590977663Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,we have witnessed the prosperity of deep learning,which is a new kind of technology that drives the artificial intelligence launching in many application fields.As one of the most representative applications of deep learning,speech recognition has taken a key role in this revolution.Under some scenarios where SNR is high and the speaker is near the microphone,such as voice search and chatbot,the performance of speech recognition is applicable.However,under other scenarios where SNR is low or the speaker is far from the microphone,such as meeting room and public environment,the performance is not even close to satisfactory duo to the influence of noises and reverberations.In this paper,we introduce very deep CNN to the noise robust speech recognition tasks.We talk about the extension of both input dimensions and its relation to the depth of the work and its impact on the performance.We discuss about the appropriate pooling and zero-padding strategies and the number of input feature maps for noise robust speech recognition.After found the best structure of very deep CNN,we further introduce the residual learning to our architecture and observe extra improved performance.We find that very deep CNN has the advantages that the number of parameters is low and the required number of training iterations is less.Besides experiments,we also reveal the noise robust essence of very deep CNN.We verify the effectiveness of very deep CNN on two typical noise robust speech recognition tasks.On Aurora4,our best model achieves 8.36 WER,which is a22% relative improvement over LSTM-RNN and beats previous best system by 14%.On AMI,our best model also achieves a 4% relative improvement over LSTM-RNN.
Keywords/Search Tags:Very Deep CNN, ResNet, Noise Robust, Speech Recognition, Acoustic Model
PDF Full Text Request
Related items