Research On Deep Learning Based Speaker Recognition Algorithm

Posted on:2021-05-29

Degree:Master

Type:Thesis

Country:China

Candidate:T Y Bian

Full Text:PDF

GTID:2428330623484144

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

With the application and popularization of various intelligent terminal devices,biometric recog-nition technology has gradually played an increasingly important role in various authentication sce-narios due to its convenience.Speaker recognition is a kind of biometric recognition.It is based on human voice signals for recognition,and is widely used in various scenarios such as criminal investigation,financial risk control,and human-computer interaction of voice terminal devices.According to the application scenario,it can be divided into two tasks: speaker verification and speaker identification.Depending on whether the content of the speech is restricted,speaker recog-nition is divided into two categories: text-dependent and text-independent.This paper focuses on more challenging text-independent speaker recognition,and tests the speaker verification task and speaker identification task respectively.This paper proposes an end-to-end speaker recognition system implementation paradigm,in-cluding neural network models based on attention mechanism and model training methods based on metric learning.The neural network models proposed in this paper combines the residual convolu-tional neural network and the attention mechanism.It not only applies the attention mechanism to high-level feature extraction,but also proposes a time-domain pooling method based on the atten-tion mechanism to learn the ability that characteristics of speech segments are weighted adaptively.Based on triplet loss,this paper proposes a novel online hard sample mining method to unify the constraints of samples pairs of the same speaker.Based on this,a stable training scheme is proposed for the hard training of triplet loss.Trained with the Voxceleb1 data set,the proposed scheme achieves an equal error rate of 5.3%on speaker verification,surpassing the most popular i-vector model and x-vector model.In addition,this scheme is an end-to-end implementation that does not require redundant backends as scoring models,but both i-vector and x-vector models rely on separately trained PLDA models for scoring.In the case of training based on the Voxceleb2 data set,this scheme reduces the equal error rate on the Voxceleb1 test set to 4.05 %,which is better than Res Net-34 and Res Net-50 training with contrast loss.Besides,the complexity of the network model in this paper is much lower than that of Res Net-34.For general multi-classification tasks,this paper proposes a training paradigm combining met-ric learning loss function and softmax cross entropy,that is,the bottleneck features of the network are trained using the CRL loss function explained in this paper,and then the final fully-connected classifier layer is trained by softmax cross entropy.These two steps can be performed simultane-ously by cutting off the gradient propagation between the bottleneck feature and the classification layer.On the Voxceleb1 dataset,this method further improves the Top-1 accuracy by 3.6 %.

Keywords/Search Tags:

Speaker Recognition, Deep Learning, Attention Mechanism, Triplet Loss

PDF Full Text Request

Related items

1	Speaker Recognition Algorithm Based On Deep Learning
2	Triplet Loss And Manifold Dimensionality Reduction Based Method For Text-independent Speaker Recognition
3	The Application Of Speaker Recognition Technology Based On Deep Learning
4	Research On Speaker Recognition Algorithm Based On Deep Convolutional Neural Network
5	Speaker Recognition Based On UBM And Deep Learning
6	Research On Key Technologies Of Speaker Recognition Based On Deep Learning
7	Speaker Recognition Method Based On Deep Learning
8	Text Independent Speaker Recognition Based On Deep Learning Framework
9	Research On Speaker Diarization Based On Deep Learning
10	Research And Implementation Of Person Re-identification System Based On Deep Learning