Font Size: a A A

Research On Speech Recognition Based On Deep Learning In Noise Environment

Posted on:2018-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2348330542470288Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The main goal of speech recognition technology is to convert human analog signals into digital signals,such as character sequences or binary codes,using computer technology.With the rapid development of mobile Internet,auxiliary input,Machine Translation,human-computer dialogue,navigation and other fields have put forward higher requirements for real-time and accuracy of speech recognition technology.The traditional speech recognition technology mainly considers the acoustic feature extraction,the acoustic model construction,the model training,the language model construction and the decoder construction in the noisy or weaker noise environment,and most of the human speaking situation exists noise.Thus,the research of speech recognition under strong noise situation can make productive fruits.In this thesis,the speech recognition in strong noise environment(SNR less than or equal to 20db)is taken as our research target.Based on the principle of deep learning,the noise injection is used as the voice training mode,and the Kaldi speech recognition system is used as the platform to study the voice with strong noise.The resulted methods and system achieved satisfactory results.Specifically,our work is listed as follows.1.The feature maximum likelihood linear regression(fMLLR)is proposed as the speech training feature in the strong noise environment.FMLLR is the result of fMLLR transformation after MFCC(Mel-Frequency Cepstral Coefficients)model training and fMLLR estimation,which can be used to maximize cross entropy.And the cross entropy transformation will be applied to each input vector of the deep neural network or the characteristics of each frame,so fMLLR has better generalization ability and can adapt to complex noise scene.Experiments show that the fMLLR feature can reduce the word error rate of 7.18% of the system recognition compared with the MFCC feature when the signal to noise ratio is 0db.2.A noise training method based on pre-training of deep belief network(DBN)is proposed.Different from the pure voice environment training methods,strong noise voice training process is divided into two parts.Firstly,the noise is injected into the training set,and the extracted MFCC feature is used to train the GMM-HMM(Gaussian mixture model-hidden Markov model)model.The fMLLR feature is extracted by this model.Then,the fMLLR feature is used to pre-train the DBN.Use DNN(deep neural network)training with pre-training results,which will give DNN a better initialization weight.In addition,a large number of non-annotation information is used in the pretraining process,which is prepared for small scale adjustment weights in DNN training.Experiments show that,in the case of signal to noise ratio of 0db,the use of DBN pre-training can reduce the system to identify the word rate of 4.1%.
Keywords/Search Tags:speech recognition, deep learning, noise training, fMLLR characteristics
PDF Full Text Request
Related items