Font Size: a A A

A New Method Based On Hmms For Noise-Robust Voice Activity Detector

Posted on:2013-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:B LuoFull Text:PDF
GTID:2248330377953763Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, voice activity detection (VAD) has become an indispensable part of speechand audio processing, such as speech recognition, speech classification, speech coding. As apre-processing of speech recognition, even a minor improvement in speech boundarydetection improves the overall system performance in long run.The traditional method of the two-door method-based VAD has become not very goodperformance in more and more complexity polluted noise. Recently, many attractivestatistical model-based VAD algorithms using the likelihood ratio test (LRT) have beendeveloped. They have made significant contributions to voice activity detection progress,especially the statistical methods based on hidden Markov models (HMMs). However thetraditional LRT model is based on a hidden Markov model with two states, which can notcalculate the observation probability of different states enough. So we proposed a novelmethod that LRT was based on two HMMs, i.e.0-the model of non-speech,1-the modelof speech. During this method, the minor difference between two patterns could be cumulatedby the four states in per model.In this paper, the organization is as follows:Firstly, the applications and significations of VAD have been described in speech andaudio processing. Then, we introduce the research works of VAD at home and abroad.Secondly, we address the elements of HMM and the three basic problems for HMMs.Next, we propose a novel HMMs based on two models four states to detector the voiceactivity endpoints.Moreover, some speech features such as fractal dimension of short time,autocorrelation-based pitch feature have been discussed. Then, we discuss the two ordersdifference MFCC which could more nearly approximate human’s auditory system.Then we use LRT which gathers the two orders difference MFCC and HMMs based ontwo models four states to judge the endpoints of speech. In this section we adopt K-means tocluster the LRT in order to get the threshold between speech and non-speech.Later, a number of results of experiments conclude that the proposed HMMs and twoorders difference MFCC have a good performance in complex noise background than othermethods which are discussed above.
Keywords/Search Tags:VAD, speech and audio processing, characteristic properties, HMM, LRT, K-means cluster
PDF Full Text Request
Related items