Convolutional Neural Networks For Voice Activity Detection

Posted on:2016-09-16

Degree:Master

Type:Thesis

Country:China

Candidate:H X Wang

Full Text:PDF

GTID:2308330461483105

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Voice activity detection is a very important pretreatment technology in speech recognition system. The voice activity detection is designed to separate the voice signal and non-voice signal, in other words, to detect the beginning and the ending points of speech from a signal which contains speech. Therefore, the voice activity detection, as a first step in speech recognition system, especially in noisy environment, is a very crucial technology. So the accuracy of the voice activity detection determines the success of subsequent work.This article achieves some common voice activity detection algorithms, and analyzes their respective advantages and disadvantages. The algorithm of short energy and zero-crossing for voice activity detection combines the advantage that the energy of speech segment is larger than the noise energy segment, and the zero-crossing is larger than those speech segments, such as voiceless, nasals, friction, and so on. This method can get better detection effect in clean speech, but with the increased noise, the effect of this algorithm is also decreased. The algorithm of information entropy for voice activity detection based on the speech frame information entropy is less than the background noise frame. This algorithm can still get good results at low signal-noise ratio, but for some noise, detection results are less effective. The algorithm of EZEf for voice activity detection is combines the advantages of time-domain and frequency-domain. The experiments show this algorithm can still detect voice activity in a noisy environment smoothly. But there is no good way to determine the threshold.For the shortage of the three methods, this article will use convolutional neural networks for voice activity detection. This algorithm uses several successive speech frames as input, alternately convolution and sub-sampling, extracting of a variety of complex features gradually, so as to enhance the network performance. Noisy speech used in this paper is mixed by pure voice and NOISEX-92 noise library. The experimental tool is Matlab. Experiments shows that the algorithm can get a higher accuracy during voice activity detection due to the extraction of features more comprehensive and less number of free parameters.

Keywords/Search Tags:

speech activity detection, convolutioml neural networks, short energy, zero-crossing rate

PDF Full Text Request

Related items

1	A Study On Robust Speech Endpoint Detection Algorithms In Noisy Environment
2	The Application And Research Based On DSP Speech Process System
3	Low Rate Speech Coding Research-based Speech Recognition And Synthesis
4	Research On Time-domain Voice Activity Detection In Noise Environment
5	Research On In-car Speech Recognition Based On One-dimensional Convolutional Neural Networks
6	The Study On Speech Enhancement Algorithm Based On CS Theory
7	Speech Enhancement Based On Deep Neural Network And Recurrent Neural Network
8	Speech Endpoint Detection Based On Statistical Models
9	The Research On The Voice Activity Detection In The Environment With High Noise
10	Research On Low Bit Rate Speech Coding Based On Perception