Font Size: a A A

Convolutional Neural Networks For Voice Activity Detection

Posted on:2016-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:H X WangFull Text:PDF
GTID:2308330461483105Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Voice activity detection is a very important pretreatment technology in speech recognition system. The voice activity detection is designed to separate the voice signal and non-voice signal, in other words, to detect the beginning and the ending points of speech from a signal which contains speech. Therefore, the voice activity detection, as a first step in speech recognition system, especially in noisy environment, is a very crucial technology. So the accuracy of the voice activity detection determines the success of subsequent work.This article achieves some common voice activity detection algorithms, and analyzes their respective advantages and disadvantages. The algorithm of short energy and zero-crossing for voice activity detection combines the advantage that the energy of speech segment is larger than the noise energy segment, and the zero-crossing is larger than those speech segments, such as voiceless, nasals, friction, and so on. This method can get better detection effect in clean speech, but with the increased noise, the effect of this algorithm is also decreased. The algorithm of information entropy for voice activity detection based on the speech frame information entropy is less than the background noise frame. This algorithm can still get good results at low signal-noise ratio, but for some noise, detection results are less effective. The algorithm of EZEf for voice activity detection is combines the advantages of time-domain and frequency-domain. The experiments show this algorithm can still detect voice activity in a noisy environment smoothly. But there is no good way to determine the threshold.For the shortage of the three methods, this article will use convolutional neural networks for voice activity detection. This algorithm uses several successive speech frames as input, alternately convolution and sub-sampling, extracting of a variety of complex features gradually, so as to enhance the network performance. Noisy speech used in this paper is mixed by pure voice and NOISEX-92 noise library. The experimental tool is Matlab. Experiments shows that the algorithm can get a higher accuracy during voice activity detection due to the extraction of features more comprehensive and less number of free parameters.
Keywords/Search Tags:speech activity detection, convolutioml neural networks, short energy, zero-crossing rate
PDF Full Text Request
Related items