Font Size: a A A

Research On Robustness Of Voice Endpoint Detection

Posted on:2020-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2438330590957605Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the advancement of information and intelligence,technologies such as automatic speech recognition(ASR)and automatic speech enhancement(ASE)are increasingly applied to people's daily lives.With the rise of the internet of things wave,voice as a means of interaction and biometric identification will provide strong technical support for the future consumer electronics field.The technique of voice activity detection(VAD)is used to distinguish between voice segments and non-speech segments in test audio signals.It directly affects the performance of voice processing technologies such as ASR and ASE.A VAD algorithm can be composed of three parts: one is the pre-processing part of the speech signal,which mainly includes the pre-emphasis,the framing and windowing,etc.The second is to extract the features of the speech signal,mainly the frequency domain characteristics and time domain features,etc.;third is the classification algorithm for speech and non-speech.In view of the current VAD algorithm,when the signal-to-noise ratio(SNR)is reduced and the noise environment is complex,the detection effect is drastically reduced.The paper starts from the three aspects of preprocessing,feature extraction and discriminant model selection,and tries a series of solutions to improve the existing VAD algorithm to improve the accuracy and real-time of endpoint detection under low SNR.Firstly,for the shortcomings of the VAD algorithm based on double threshold decision method in low SNR environment,this paper uses Kullback-Leibler(KL)divergence of speech signal power spectral density as a kind of discriminant speech and non-speech feature,combined with order static filter(OSF)and adaptive threshold method,a VAD algorithm based on KL divergence adaptive threshold is proposed.Secondly,for the traditional discriminant model,the shortcomings of the long-short information of the speech signal cannot be utilized.A long-short-term-memory(LSTM)network is used to train a discriminant model of speech and non-speech,which can make full use of the length of time information of the speech signal.in this paper,based on LSTM network,KL divergence feature,MFCC feature and OSF,a speech endpoint detection algorithm based on LSTM neural network is proposed.Finally,this paper implements the data annotation script for endpoint annotation through Python programming.The performance of several typical VAD algorithms and two improved algorithms are analyzed by using the annotated data simulation,and their mathematical models are given.the experimental results show that the two improved VAD algorithms proposed in this paper have higher detection accuracy,better robustness and real-time performance.
Keywords/Search Tags:VAD, LSTM, KL divergence, sequential statistical filter, low SNR
PDF Full Text Request
Related items