Font Size: a A A

Voice Activity Detection Based On Sequential Gaussian Mixture Model

Posted on:2018-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z ShenFull Text:PDF
GTID:2348330542481361Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Voice activity detection(VAD)is a binary classifier that partitions the frame sequence or frequency bins into speech/nonspeech clusters in an on-line manner.It is an important front-end of modern speech signal processing systems,many systems use VAD as a preprocessor to improve their performance.There are three important research branches of VAD,including the robust acoustic features,the statistical signal processing,and the deep learning methods.Among these methods,the DNN-based voice activity detection performs fairly well when the training scenario is matched with the test scenario;otherwise,their performance will degrade.In addition,the training model require extra memory to be stored,and the computational load is unacceptable in some mobile computing systems.Therefore,the conventional statistical model are still useful to VADs.The Gaussian model was conventionally employed to describe the probability density function of speech/nonspeech signals,and classification was conducted based on likelihood.However,the conventional technique was not unified into a theoretical framework that enables optimal classification.This paper makes use of a sequential Gaussian mixture model(GMM)to model the logarithmic power sequence at each frequency band.The sequential likelihood function is presented to estimate the parameter set of this GMM frame by frame.The likelihood function is sequentially maximized based on the iterative Newton-Raphson method,and the on-line estimation is expressed as a first-order regression.Eventually,the power sequence is classified into speech/nonspeech based on the criterion of the maximum likelihood.The experimental result confirmed the superiority of the proposed method.Voice activity detection is just one application of the sequential GMM.The proposed method can be extended to many on-line classification,and guarantees the classification error to be minimized.
Keywords/Search Tags:Voice Activity Detection, Gaussian Mixture Model, Speech Presence Probability, Maximum Likelihood
PDF Full Text Request
Related items