Font Size: a A A

A Study On The Generative Modelling For Speaker Verification Based On Deep Neural Network

Posted on:2021-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:J F ZhouFull Text:PDF
GTID:2518306020450464Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The speaker recognition system based on deep neural network has better performance than the traditional probabilistic generative modeling approach by virtue of its nonlinear modeling advantage with large amount of labelled data.However,in the real application scenario,the generalizability of the speaker recognition system based on the neural network model is seriously inadequate:on one hand,the performance of the system will drop dramatically under the complex test environment,and on the other hand,the model training by classifying in-set data does not have a strong generalization ability for out-of-set data.In this paper,we focus on the generalization performance of existed speaker recognition models and investigate a deep neural network-based generative speaker modeling approach.The main points of innovation of this paper are shown as follows.(1)For noise robustness,a speaker embedding extraction method based on a generative adversarial network framework is proposed.In this paper,we design a multitask and multi-categorical speaker feature learning framework based on an adversarial architecture,which improves the noise robustness of speaker representation by suppressing the influence of non-target information(noise information)on target information(speaker information)through an adversarial mechanism.In addition,in order to solve the problem of model training and label required in the multi-task and multi-categorical adversarial network framework,this paper further designs a noise robustness learning framework based on generative adversarial networks,which fully exploits the matched generation capability of generative adversarial networks,weakens the influence of non-target information,develops the noise immunity of speaker features,and improves speaker recognition performance in a noisy environment.(2)A PLDA modeling approach based on deep neural networks is proposed for modeling the distribution of embeddings.Unlike traditional PLDA training methods,this paper correlates the modeling assumptions of PLDA with the deep network architecture and reimplements the generative process of PLDA using a variational autoencoder.On the practical scenario test set VoxCeleb1,VAE-PLDA obtains better recognition performance than traditional PLDA,and improves the generalization capability of existing speaker models in complex scenarios.(3)A channel information enhancement method based on attention mechanisms is proposed.In this paper,we analyze the differences in the contribution of different channel information and propose an attention mechanism-based channel information enhancement method,which uses the Squeeze-and-Excitation network to learn the contribution of different channel dimensions to speaker characterization learning,and strengthens useful information and weakens useless information based on differences in contribution,so as to extract more discriminative speaker characterization.
Keywords/Search Tags:Speaker verification, Adversarial learning, Generative model, PLDA
PDF Full Text Request
Related items