Font Size: a A A

Research On Robust Speech Recognition In Noise Environment

Posted on:2019-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y F TangFull Text:PDF
GTID:2428330596960612Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Voice is the most convenient way for human conmmunication.With the development of science and technology,the human-computer interaction method with speech recognition technology as the core has become increasingly popular.After more than 60 years of development,speech recognition technology has greatly improved.In a quiet environment,the performance of the speech recognition system is close to the human level.In the case of a noisy environment,the recognition accuracy of the speech recognition system is greatly reduced.Enhancing the robustness of speech recognition system in a noisy environment has become one of the keys to the large-scale use of speech recognition technology.The paper mainly aims at the situation that the performance of the speech recognition system degraded under the noise environment,using the speech enhancement algorithm to carry on the noise reduction treatment to the input speech,raise the quality of the input speech and improve the robustness of the speech recognition system.However,the traditional speech enhancement algorithms can distort the speech.Therefore,it is necessary to improve the speech enhancement algorithm for the characteristics of the automatic speech recognition system.The main work of this paper is:(1)The improved Wiener filtering algorithm based on the masking effect of human ear and harmonic recovery principle is studied.Firstly,the auditory masking effect of human ear is studied,and the method of solving the masking threshold is introduced.Secondly,the minimum statistical noise estimation(MS)algorithm and the Minima-Controlled Recrsive Average based noise estimation method(MCRA)are studied.Then the paper put forward the improvement plan,the first step,adjust the estimation of the noise energy spectrum together with the masking threshold;the second step,use the harmonic recovery principle to recover the lost speech component;the third step,in order to reduce the distortion on the output speech with prior signal noise Than the guidelines for post-processing.Finally,after the simulation experiments,the present improvement can really improve the voice quality.(2)An improved log-domain MMSE amplitude spectrum estimator is studied.Firstly,the principle of the linear MMSE amplitude spectrum estimator and the logarithmic domain MMSE amplitude spectrum estimator are studied.Through experimental comparison,it is found that the log domain MMSE amplitude spectrum estimator has a better effect.Then for the case that the residual noise and speech distortion still exist in the speech processed by the log-domain MMSE amplitude spectrum estimator,the frame SNR is introduced.According to the frame SNR,the noise control factor,the speech energy minimum control factor and the residual noise suppression factor are used to improve the gain function of the logarithmic domain MMSE amplitude spectrum estimator.Finally,experimental simulations verify that the improvements in this chapter do significantly reduce residual noise and speech distortion.(3)The application of deep neural network technology in speech enhancement algorithm is studied.Firstly,the application of a deep belief network based speech enhancement algorithm is studied.A regression model is trained and its effect on the non-matching noise test set is verified.Secondly,focusing on the difference between the noisy speech signal constructed in the network training and the actual application scenario,and the difference between the noise library and the noise environment in the actual scene,the paper refers to the speech signal perturbation strategy for improving the generalization ability of the model in speech recognition.In this paper,noise is perturbed on the frequency,so that the limited noise has as many characteristics as possible,enriches the training set,and increases the generalization ability of the model.Second,taking into account the difference in the weight of the error at different frequency points,the frequency weight coefficient was constructed using the prior signal-tonoise ratio,and the loss function was improved.Then,the model was studied and it was found that there is a lot of redundancy in the model.After the particularly small values are reset to zero,the related connections are removed from the network,and the pruned network is retrained for the same performance,and improvements are proposed and the effect is verified.Finally,the effectiveness of all the speech enhancement algorithms improved in this paper is verified on the speech recognition systems built by the latest speech recognition frameworks,CMU Sphinx and Kaldi,and compared with the effects of traditional speech enhancement algorithms.Experiments show that the improved speech enhancement algorithm for speech recognition improves the performance of automatic speech recognition system in noisy environments.The speech enhancement algorithm based on neural network works best.
Keywords/Search Tags:speech recognition, speech enhancement, masking effect, harmonic recovery, frame SNR, deep neural network
PDF Full Text Request
Related items