Font Size: a A A

Voice Acitivity Detection With Deep Learning

Posted on:2017-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:S B TongFull Text:PDF
GTID:2428330590491526Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the most important carrier of communication,audio plays an indispensable role in humans life.Voice activity detection(VAD)is a technique used in speech processing in which the presence or absence of human speech is detected.It is broadly applied to various speech applications such as automatic speech recognition(ASR),speech synthesis,speech coding and speech enhancement.It can directly influence the performance of these applications.With the development of the world,traditional shallow-layer machine learning models have reached the bottleneck.Since 2006,deep learning proposed by Hinton has draw much attention in both industry and academics.It has strong capacity to learn the non-linear relationship inside data,which brings inspiration for the real world applications of VAD tasks.In this thesis,we explore the application of various deep learning approaches in VAD problem.First,we introduce deep neural network,recurrent neural network and convolutional neural network and we conduct a series through comparative study of the robustness of these deep learning approaches for VAD especially under noisy conditions.Meanwhile,we found conventional VAD evaluation criteria are mostly based on frame-level accuracy of speech/non-speech classification,which may result in weak correlation between VAD and ASR performance.Under this circumstance,an integrated VAD evaluation criterion taking various boundary effects into account is proposed.Besides,we also propose a novel neural network based VAD framework.The input to neural network is augmented with estimations of noise and noisy speech.With the additional representations,more cues are incorporated into the neural network to learn the relationship between noisy speech and noise.
Keywords/Search Tags:Deep Learning, Voice Activity Detection, Evaluation Criterion, Speech Recognition, Noise Adaptive Training
PDF Full Text Request
Related items