Font Size: a A A

Research On Voice Endpoint Detection Method In Noisy Environment

Posted on:2022-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LuoFull Text:PDF
GTID:2518306524451874Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The main purpose of voice endpoint detection is to distinguish the voice segment and non-voice segment from the voice signal,but the voice signal is often accompanied by various noises,and the presence of noise directly affects the performance of endpoint detection.This paper starts from the voice endpoint detection method based on characteristic parameters,and conducts research on voice endpoint detection in noisy environment.The specific research work includes the following aspects:Firstly,in order to solve the problem of poor robustness of the features used in the single feature-based voice endpoint detection method in low signal-to-noise ratio environment,the first dimension coefficient(GFCC0)of Gammatone frequency cepstral coefficient(GFCC)of the speech signal is introduced into the speech endpoint detection task in this paper,and the endpoint detection of the speech signal is realized by combining the multi-window spectral subtraction method.Using the GFCC0 feature in four noise environments such as babble and volvo can achieve higher detection accuracy than the spectral entropy method and the logarithmic spectrum distance method.Although the combined multi-window spectral subtraction method will increase the detection time,it can further improve the GFCC0 feature detection accuracy under low signal-to-noise ratio babble noise and volvo noise.Secondly,aiming at the problem of insufficient endpoint detection performance of the voice endpoint detection method based on multi-feature fusion in a complex noise environment,this paper proposes a fusion feature combining Gammatone frequency cepstral coefficient(GFCC)and Mel frequency cepstral coefficient(MFCC).The GFCC0 and MFCC0 features of the speech signal are multiplied to construct the first type of fusion features.The first type of fusion features can achieve effective tracking of the voice segment,but the ability to track unvoiced sounds in the speech segment is slightly insufficient in some noise environments.Thirdly,aiming at the problem of insufficient tracking ability of the first type of fusion features for the unvoiced segment,this paper proposes an adaptive weighted fusion method,which uses the projection feature with strong unvoiced tracking ability and the band-partitioning spectral entropy feature with strong voiced tracking ability to improve GFCC0.The feature's ability to track unvoiced and voiced sounds is a second type of fusion features that takes into account the tracking capabilities of unvoiced and voiced sounds in the speech segment.Finally,aiming at the problem that the endpoint recognition method with fixed threshold values affects the performance of endpoint detection,this paper uses adaptive estimation double threshold method as endpoint recognition method on the basis of extracting two kinds of fusion features,and realizes endpoint detection of noisy speech signal based on two kinds of fusion features respectively.Experimental results in seven noisy environments such as babble and volvo show that the first type of fusion features can effectively improve the accuracy of endpoint detection under five noisy environments,while the second type of fusion features achieves better results than comparison algorithms under seven noisy environments.Especially in the volvo noise environment,the detection accuracy can reach more than 94.5%.
Keywords/Search Tags:voice endpoint detection, Gammatone frequency cepstral coefficient(GFCC), Mel frequency cepstral coefficient (MFCC), band-partitioning spectral entropy, multi-feature fusion
PDF Full Text Request
Related items