Font Size: a A A

System Design And Robust Optimization Of Speaker Recognition Based On ASV-Subtools

Posted on:2021-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhaoFull Text:PDF
GTID:2518306017959869Subject:Computer technology
Abstract/Summary:PDF Full Text Request
X-vector neural network performs very good in speaker recognition,and it outperforms traditional i-vector system,especially for text-independent,medium and short utterances.But x-vector still doesn't have enough generalization and results in a bad performance in real condition.To increase the generalization of x-vector for unknown data and speakers,the optimization and analysis of both training data and loss function are provided in this paper.Furthermore,Pytorch-based deep learning tools for speaker recognition are developed independently and an instant data augmentation method is proposed to make the optimized model more robust.The main work and contributions of this paper are as follows.(1)The speaker recognition tools,named ASV-Subtools,is developed based on Kaldi and Pytorch.The ASV-Subtools is mainly used to replace the neural network training in Kaldi.On the one hand,although the Kaldi is a popular tool in speech field,its support for neural network is limited and it is difficult to revise the source code of Kaldi,which leads to low efficiency of research.On the other hand,as a popular deep learning tool,Pytorch could cover the shortage of Kaldi's training,but a set of systematic,efficient,scalable and open source training tools is still lack in speaker recognition field.Therefore,the ASV-Subtools is developed to increase the efficiency of experiment by connecting Kaldi's data process and neural network training of Pytorch.(2)An instant data augmentation method,which is named Inverted Specaugment,is proposed for the x-vector's generalization for unknown data.Specaugment is very useful to avoid the dependence of adjacent features by randomly masking frequency and time features during the training process.But it has the same problem like Dropout that the operation results in a mean gap between training and testing.This problem is fixed in Inverted Specaugment and an extra random multi-drop algorithm is adopted to make this method more robust.Comparing with baseline system,this method could increase the performance of evaluation set of Voxceleb1 by 17%in this study.(3)Margin loss is studied in detail and AM-Softmax loss is used to replace Softmax loss to increase the x-vector's generation for unknown classes.After adopting the Inverted Specaugment method,the performance of evaluation set of Voxcelebl could be further improved by 21%comparing with the baseline.
Keywords/Search Tags:Speaker Recognition, ASV-Subtools, Instant Data Augmentation, AM-Softmax Loss
PDF Full Text Request
Related items