Font Size: a A A

Methods Of Speech Endpoints Detection In Noisy Environments

Posted on:2013-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:D L HuFull Text:PDF
GTID:2248330377453865Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Endpoints detection (voice activity detection, VAD) is to find the start and end of thespeech section on an input signal, that is, the speech section is separated from the backgroundnoise. Then, the effective data is provided to the speech recognition system. Though theresearch for several years, the VAD technology has been developed greatly and the satisfyingachievements in the laboratory have been gained. However, there are many types of noise inreality, and their occurring lead to the bad detection results. Therefore, it’s important to studythe VAD methods in noisy environments.The detection methods which have been proposed can be divided into two classifications.One is based on the feature, for there are many features showing the difference between thespeech signal and the noise signal. In this method, the features are extracted at first, and thenthey are compared with the setting thresholds, finally the speech is separated from the noisebased on the comparison. The other one is based on the model. The parameters for the modelsof the speech and the noise must be estimated. The theory of the former one is easy tounderstand and carry out; therefore, it has been widespread used. However, when thesignal-to-noise ratio (SNR) becomes low, the speech can be severely affected by the noise,even submerged by the noise, and then the detection results go bad. The latter one based onmodel has great calculation and complexity; therefore it has difficulty in meeting the demandsof the real-time system.In this paper, the VAD methods based on the features are studied and the simulationexperiments of them have been done. Then some improvements are given to enhance therobustness of the detection in noisy environments. The main contents are as follows:Firstly, the signal in noisy environments are de-noised by wavelet to restrain the noise,and then the detection with decision trees are proposed to improve the traditionaldouble-thresholds method. What’s more, the simulation experiments indicate that the methodwith decision trees works better than the traditional one and the disadvantage of the traditionalmethod causing the falling tendency of the accuracy with the dropping SNRs can bemoderated to some extent.Secondly, some improvements are given based on the algorithm of adaptiveband-partitioning spectral entropy. Before calculating the adaptive band-partitioning spectralentropy, estimating the noise level of the noisy signal is performed to ensure the de-noisingprocess is necessary or not. Because the de-noising process is insignificant for the signal inhigh SNR environment. Then the probability formula with each subband power spectral istested with MATLAB, and the simulation results show that the improved formula can represent the speech section much better than the former one. Compared with some othermethods, the improved method based on adaptive band-partitioning spectral entropy hashigher roundness.Thirdly, the subtractive clustering and k-means clustering are applied in the voiceactivity detection. And the simulation and the analysis are made.
Keywords/Search Tags:Speech Endpoints Detection (Voice Activity Detection), Decision Trees, Adaptive Band-partitioning Spectral Entropy, Noise Estimation, K-means Clustering
PDF Full Text Request
Related items