Font Size: a A A

Speaker Recognition Based On Additive Margin Loss

Posted on:2020-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:L FanFull Text:PDF
GTID:2428330575958252Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of deep learning,there is a paradigm shift of recent speaker recognition studies,from i-vector to deep neural networks.Meanwhile,the discrimi-nation of existing speaker recognition methods based on deep neural networks are not good enough.Deep neural networks provide good modeling capacity and training cri-terion is important for exploiting the power.This thesis analyses and compares the recent works on additive-margin based training criterion for deep learning and propos-es three new methods based on additive margin to increase accuracy or efficiency for three tasks in speaker recognition called speaker verification,speaker identification and speaker retrieval.1.Existing training criterion of classification cannot provide enough discrimi-nation for speaker verification.This article proposes Ensemble Additive Margin Em-bedding(EAME)for tackling the aforementioned problem.This work introduces en-semble embedding layers for speaker verification.EAME employs additive margin to improve the discrimination of the speaker verification model and ensemble embedding layers to enhance the robustness of the deep speaker representations.Experiments reveal that EAME can outperform existing speaker verification methods to achieve state-of-the-art performance.2.This article proposes Ensemble Additive Margin Classifier(EAMC)to improve discrimination of speaker identification models.This work introduces ensemble clas-sifiers and hard example mining for speaker identification.EAMC employs additive margin to improve the discrimination of the model and hard example mining to make the model focus on hard training samples.Ensemble classifiers in EAMC can improve the capacity of the model and smooth the change of difficulty during training proce-dure.Experiments reveal that EAMC can outperform existing speaker identification methods to achieve state-of-the-art performance.3.Existing hash-based speaker retrieval methods mostly adopt two-stage learning procedure which learns hash codes after extracting representations with i-vector.On the one hand,this two-stage learning procedure is harmful for hash learning.On the other hand,performance of the learnt hash codes is limited by i-vector.This article proposes Deep Additive Margin Hashing(DAMH)for tackling these problems.More specifically,DAMH employs deep hashing to enhance the feedback between feature learning and binary code learning.Further more,DAMH adopts additive margin to improve the discrimination of the learnt binary codes.As far as we know,DAMH is the first deep hashing method for speaker retrieval.Experiments reveal that DAMH can outperform existing speaker retrieval methods to achieve state-of-the-art performance.
Keywords/Search Tags:speaker recognition, additive margin, ensemble strategy, hashing
PDF Full Text Request
Related items