Research On Robustness Of Voice Endpoint Detection

Posted on:2020-12-13

Degree:Master

Type:Thesis

Country:China

Candidate:W Chen

Full Text:PDF

GTID:2438330590957605

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the advancement of information and intelligence,technologies such as automatic speech recognition(ASR)and automatic speech enhancement(ASE)are increasingly applied to people's daily lives.With the rise of the internet of things wave,voice as a means of interaction and biometric identification will provide strong technical support for the future consumer electronics field.The technique of voice activity detection(VAD)is used to distinguish between voice segments and non-speech segments in test audio signals.It directly affects the performance of voice processing technologies such as ASR and ASE.A VAD algorithm can be composed of three parts: one is the pre-processing part of the speech signal,which mainly includes the pre-emphasis,the framing and windowing,etc.The second is to extract the features of the speech signal,mainly the frequency domain characteristics and time domain features,etc.;third is the classification algorithm for speech and non-speech.In view of the current VAD algorithm,when the signal-to-noise ratio(SNR)is reduced and the noise environment is complex,the detection effect is drastically reduced.The paper starts from the three aspects of preprocessing,feature extraction and discriminant model selection,and tries a series of solutions to improve the existing VAD algorithm to improve the accuracy and real-time of endpoint detection under low SNR.Firstly,for the shortcomings of the VAD algorithm based on double threshold decision method in low SNR environment,this paper uses Kullback-Leibler(KL)divergence of speech signal power spectral density as a kind of discriminant speech and non-speech feature,combined with order static filter(OSF)and adaptive threshold method,a VAD algorithm based on KL divergence adaptive threshold is proposed.Secondly,for the traditional discriminant model,the shortcomings of the long-short information of the speech signal cannot be utilized.A long-short-term-memory(LSTM)network is used to train a discriminant model of speech and non-speech,which can make full use of the length of time information of the speech signal.in this paper,based on LSTM network,KL divergence feature,MFCC feature and OSF,a speech endpoint detection algorithm based on LSTM neural network is proposed.Finally,this paper implements the data annotation script for endpoint annotation through Python programming.The performance of several typical VAD algorithms and two improved algorithms are analyzed by using the annotated data simulation,and their mathematical models are given.the experimental results show that the two improved VAD algorithms proposed in this paper have higher detection accuracy,better robustness and real-time performance.

Keywords/Search Tags:

VAD, LSTM, KL divergence, sequential statistical filter, low SNR

PDF Full Text Request

Related items

1	Semantic Analysis Of Statistical Literature Domain Based On Word2Vec-LSTM Model
2	Research And Application Of Prediction And Classification Model Based On Sequential Feature Analysis
3	Statistical Complexity Measure Analysis Of Gait Signal Based On LMCD And JSD
4	Exploiting smoothness in statistical learning, sequential prediction, and stochastic optimization
5	Distributed Multi-view Target Tracking, Statistical Inference Methods And Achieve
6	Sequential Statistical Signal Processing with Applications to Distributed Systems
7	Sequential Monte Carlo Methods With Applications To Communications
8	Mimu/GPS/Magnetometer Integrated Navigation Sequential Adaptive Filtering Technology
9	Constraint-based Sequential Pattern Mining And Its Applications
10	Research On Statistical Modeling Of SAR Images And Its Application Based On Generalized Gamma Distribution