Font Size: a A A

Research On Speech Detection Method In Noise Environments

Posted on:2016-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y L QiuFull Text:PDF
GTID:2308330473955855Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The speech detection is the technology which can automatically distinguish active speech from speech signals, and it is also known as voice activity detection. It has become an important component in speech signal processing and has been widely applied in many research fields, such as speech coding, speech enhancement and speech recognition. Recently, the existing voice activity detection methods have ideal detection result in the high signal to noise ratio noise environments. However, its detection result largely falls in the low signal to noise ratio noise environments. Therefore, the study of robust voice activity detection method has a great significance in the low signal to noise ratio noise environments.This thesis focuses on voice activity detection based on machine learning. Based on some existing detection methods which have been studied and analysised, we improve two voice activity detection methods, which achieces the goal to improve the speech detection performance in the low signal to noise ratio environment. The concrete works and innovations are as follows:(1) The weighted learning voice activity detection based on robust featureThis method improves some existing problems that the voice activity detection method based on harmonic frequency components in likelihood ratio test. The noise variance is a key of calculating the likelihood ratio of voice activity detection. In order to increase the likelihood ratio calculation accuracy, we use unbiased minimum mean-square error method to carry on noise estimation. The harmonic peaks of voiced frames are seriously disturbed by noise under low signal to noise ratio environment, which can’t improve the total score of the original multi-frame likelihood ratio sufficiently. To solve this problem, we propose a new feature, which is called as the likelihood geometric mean based on long-term spectral peaks. We make up a new feature vector which incorporates this feature into the original multi-observation feature vector. In order to eliminate the drawback of the equal weighted model, we improve the decision rule of voice activity detciton, which makes up the weighed new feature vector based on the minimum classification error discriminative training method. The experiment results show that the method improves the detection of weaker speech frames, thereby improves the detection performance of the voice activity detection.(2) Voice activity detection based on noise and signal classificationAs the detection model construction of voice activity detection based on signal classification is trained by the mixing noisy speech of multi-condition noise environments. It does not take into account the noise type, which has some effects on the detection performance of voice activity detection. Therefore, in order to improve the detection performance, we introduce a noise classification method which can as the first step of the voice activity detection method. In order to improve the accuracy of the noise classification, we use perceptual wavelet packet transform to extract the noise feature. For the binary classification problem, the requirement of signal feature is provided with high separable characteristic, so we use a method that extracts signal feature by using multitaper MFCC which has a high robustness. The experiment results show that the method effectively improves the detection performance of voice activity detection.
Keywords/Search Tags:voice activity detection, noise estimation, long-term spectral peaks, minimum classification error, noise classification
PDF Full Text Request
Related items