Voice Activity Detection Based On Deep Learning

Posted on:2021-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:T J Xu

Full Text:PDF

GTID:2428330620476428

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Speech is the most natural medium for interaction of human-to-human and human-to-machine.In real environment,noise always exists and reduces the quality of communication and also the performance of automatic speech recognition(ASR)and speaker recognition systems.Voice activity detection(VAD)is to detect the appearance of the speech in noisy environment.Typically,it is often used as a pre-processing step and plays an important role for ASR system.This dissertation studied on the principles of voice activity detection methods and focused on deep learning based methods.Based on these methods,three supervised learning algorithms for VAD are proposed:1.We proposed a two-stage training method based on convolutional long short-term memory deep neural network(CLDNN).CLDNN is the state-of-the-art deep learning model for VAD.This work analyzed the structure characteristics of it and proposed a two-stage training method with a non-sequential stage and a sequential stage to improve the data utilization.2.We proposed a joint training VAD with related speech enhancement task which utilizes enhanced self-coding as auxiliary features.This work analyzed the three types of joint training algorithms for VAD and speech enhancement,improved the performance of joint training method with self-encoding auxiliary features and extended the joint methods.Then,an automatic hyperparameter weight adjustment algorithm is proposed.3.We improved the VAD algorithm based on likelihood ratio test.There are two defects of likelihood ratio test based VAD,one is that the estimation of parameters is not accurate,and the other is that the decision threshold needs to be set manually.To cope with these two problems,an algorithm which combines statistical signal processing and deep learning is designed,which using time-frequency masking to estimate parameters and calculate the threshold value dynamically by a global average pooling(GAP)layer.Systematic experiments show that the two parts of the proposed method can effectively improve the performance of the traditional signal processing baseline system,respectively.Compared with the end-to-end deep learning method,the proposed has obvious advantages when the model scale is similar.

Keywords/Search Tags:

voice activity detection, deep learning, training strategy, joint training, likelihood ratio test

PDF Full Text Request

Related items

1	Generalized Likelihood Ratio Test For Voice Activity Detection Based On Source-Filter Model
2	Research And Realization Of Voice Activity Detection Based On Multiple Observation Likelihood Ratio System
3	Research On Voice Activity Detection Technology In High Noisy Environment
4	Voice Acitivity Detection With Deep Learning
5	Research On Voice Activity Detection Based On Deep Learning
6	Study On Soft Voice Activity Detection Based On Generalized Gamma Distribution In Transformed Domain
7	Research And Application On Adversarial Training Defense Strategy
8	A Study Of Efficient Training Approaches To Deep Learning Models
9	Large-scale Visual Relationship Detection Based On Hierarchical Training Strategy
10	Research On The Design Of Human Resource Training And Exploration System For Power Plant A