Font Size: a A A

Study On Robust Voice Activity Detection Using CNN Encoder-decoder Based On MTF Concept Under Noisy Conditions

Posted on:2021-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:N LiFull Text:PDF
GTID:2518306548485814Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Voice activity detection(VAD)is a technique to determine the starting and ending points from speech signals.However,the previous VAD methods usually have poor robustness.In this study,the modulation transfer function(MTF)concept was combined with the deep neural network architecture to improve the accuracy of VAD in noisy environment.The influence of noise on noisy speech can be regarded as MTF.Theoretically,if the MTF of speech in noisy environment can be estimated,then the influence of noise on speech can be eliminated by an inverse MTF filter.In this study,we propose a method of time-domain power envelope recovery using MTF method.By setting a threshold value for the time envelope of recovery,speech or non-speech can be further determined.In the MTF concept,to eliminate the effect of additive noise,global signal-to-noise ratio(g SNR)needs to be estimated.However,due to the mixing of additive noise and speech components,it is very difficult to estimate g SNR directly in the original time domain signal.In this study,the subband signal processing method was used to estimate g SNR.The proposed g SNR estimation methods mainly include subband speech signal processing,subband threshold computing unit,subband power computing unit and g SNR computing unit.In the calculation of subband threshold,as the deep neural network can better solve the nonlinear problem,this study uses CNN encoder-decoder(C-ED)to estimate the subband speech signal threshold of speech and noise.Finally,according to the obtained threshold,the final g SNR is obtained.Experiments under stationary and unstationary noisy conditions show that compared with the previous MTF-based VAD methods,the MTF method proposed in this paper has better performance.This method can effectively reduce the adverse effects of noise on VAD,especially in low SNR and non-stationary noise environment.
Keywords/Search Tags:VAD, MTF, CNN, encoder-decoder
PDF Full Text Request
Related items