Voice Activity Detection Based On Sequential Gaussian Mixture Model

Posted on:2018-03-11

Degree:Master

Type:Thesis

Country:China

Candidate:Z Shen

Full Text:PDF

GTID:2348330542481361

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Voice activity detection(VAD)is a binary classifier that partitions the frame sequence or frequency bins into speech/nonspeech clusters in an on-line manner.It is an important front-end of modern speech signal processing systems,many systems use VAD as a preprocessor to improve their performance.There are three important research branches of VAD,including the robust acoustic features,the statistical signal processing,and the deep learning methods.Among these methods,the DNN-based voice activity detection performs fairly well when the training scenario is matched with the test scenario;otherwise,their performance will degrade.In addition,the training model require extra memory to be stored,and the computational load is unacceptable in some mobile computing systems.Therefore,the conventional statistical model are still useful to VADs.The Gaussian model was conventionally employed to describe the probability density function of speech/nonspeech signals,and classification was conducted based on likelihood.However,the conventional technique was not unified into a theoretical framework that enables optimal classification.This paper makes use of a sequential Gaussian mixture model(GMM)to model the logarithmic power sequence at each frequency band.The sequential likelihood function is presented to estimate the parameter set of this GMM frame by frame.The likelihood function is sequentially maximized based on the iterative Newton-Raphson method,and the on-line estimation is expressed as a first-order regression.Eventually,the power sequence is classified into speech/nonspeech based on the criterion of the maximum likelihood.The experimental result confirmed the superiority of the proposed method.Voice activity detection is just one application of the sequential GMM.The proposed method can be extended to many on-line classification,and guarantees the classification error to be minimized.

Keywords/Search Tags:

Voice Activity Detection, Gaussian Mixture Model, Speech Presence Probability, Maximum Likelihood

PDF Full Text Request

Related items

1	Time-Frequency Speech Presence Probability And Noise Power Spectrum Estimation In Noisy Environments
2	Research And Implementation Of Voice Activity Detection Algorithm Based On GMM And SVM Under Complex Environment
3	Research On Coding Algorithm Based On GMM Speech Spectral Envelope Representation
4	Research On The Speech Emotion Recognition Based On Voice Signal
5	Research And Implementation On Speaker Recognition
6	The Study And Application Of Model Cluster Based On EM Algorithm
7	Research On Voice Activity Detection Technology In High Noisy Environment
8	Study On Voice Activity Detection Algorithm Based On General Gaussian Model
9	The Research Of Application And Optimization Of Gaussian Mixture Model In Data Clustering
10	Multi-speaker Recognition Based On Audio Video Information Fusion In Meeting Room Environment