Font Size: a A A

Research On Voice Activity Detection Based On Deep Learning

Posted on:2018-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2348330518498571Subject:Engineering
Abstract/Summary:PDF Full Text Request
Voice activity detection(VAD)is one kind of technique that can be used to mark the beginning and ending points of the speech segments in speech signals with or without being interfered by background noises.It is an important preprocessing step for many speech processing systems.Deep learning(DL)is a kind of information extraction technology which takes advantage of multi-layer nonlinear transformation.It is useful in mining essential feature from mass data through its hierarchical structure.With its fast developments,the DL has being widely applied in the speech processing area and astonishing results have been achieved.Therefore,it is of theoretical significance and practical value to do some research on VAD based on the DL technology.Tradational VAD algorithms which have been proposed before are simple in theory and calculation and therefore are easily to be implemented with hardware.These old VAD algorithms are able to have higher detection accuracy in quiet environments,but for many situations within noise background,their performances are unsatisfactory.In order to improve the accuracy of VAD algorithms under low signal-to-noise ratio,the existed algorithms are analysed in this thesis,and some DL methods are used to mine the deep features of speech features.Then in the thesis,a VAD algorithm based on DL is designed.This algorithm takes multiple features as inputs and makes good use of the Deep Belief Network and Stacked Sparse Auto Encoder as the DL models in the algorithm.(1)Multiple speech features extraction.As we all known,suitable speech features play an important role in voice activity detection.In order to avoid the adverse effects of single feature on detection,a deep analysis of G.729B-VAD and AMR-VAD algorithms is made in this thesis.Then four speech features,linear spectrum frequency,full band energy,low-frequency energy and short-time zero crossing rate,are extracted from G.729 B coding algorithm.And four speech features from AMR-NB coding algorithm are extracted,they are tone,pitch,complex signal and the sum of signal-to-noise ratio.After normalization,take them as the speech features of the proposed algorithm.(2)A VAD algorithm based on deep learning.In this thesis,four kinds of classical deep learning model are analysed,they are Restricted Boltzmann Machine,Auto Encoder,Convolutional Neural Networks and Recurrent Neural Networks.Then,a Deep Belief Networks and a Stacked Sparse Auto Encoder fed with multiple speech features are built on MATLAB software platform,which is meant for extracting deeper features from the raw ones through supervised and unsupervised training.Finally,the Softmax classifier is used to judge whether it is a speech frame or not.In order to verify the performance of the proposed algorithm,voice activity detection experiments based on the TIMIT speech database and NOISEX-92 noise library are conducted in this thesis.The experiment results show that the proposed method improves the detection accuracy compared with G.729B-VAD and AMR-VAD in low signal-to-noise ratio environment.And the noise adaptability experiment based on SSAE proves that the proposed algorithm works well even in an unknown noise environment.
Keywords/Search Tags:Voice Activity Detection, Deep Learning, Deep Belief Networks, Stacked Sparse AutoEncoder
PDF Full Text Request
Related items