| Voice activity detection (VAD) is used to accurately detect the start and end points of the voice signal from background noise, which increases the detection accuracy as well as decreases the time during of voice signal processing. Traditional VAD algorithms perform well theoretically, but would decease efficiency rapidly when applied under noise-corrupted real environment. Meanwhile, portable application requires the VAD algorithm with high space and time complexity. Therefore, the study focus of this paper is to look for an effective and robust VAD in practical application.In this paper, the study for a practical and performance-improving algorithm is conducted from the perspectives of noise estimation, wiener filtering and endpoint detection by taking a series of effective methods.First and foremost, a rapid and robust noise estimation is obtained by applying the minima-controlled recursive averaging (MCRA). This improves the accuracy of traditional VAD noise estimation and lessens the background noises interference with vocal signal. The combination of TSNR and non-linear harmonic enhancement makes the wiener-filtering speech enhancement attain the largest increase in average segSNR the enhanced perceptual voice quality.Second, this paper analyzed the frequently-used audio features in speech endpoint detection, such as short time energy, short time entropy and introduction of short-term relative amplitude based on short-term amplitude etc, and then proposed EHR. EHR is an audio feature which can effectively distinguish the speech from the non-speech under the noisy background as well as combining the time-domain and frequency-domain. It is based on EHR that an effective and adaptive VAD which can accurately detect the start and end points of the speech in a complex environment is brought forward.Finally, this method is verified by comparing the noise speech database and self-collected and arranged database. Detection result is also analyzed and evaluated. Experimental results show that the VAD is effective and robust, and it can constantly and rapidly complete the calculation under noisy background and improve its performance in practical application. |