Font Size: a A A

Research On Loss Functions In Neural Networks For Speaker Recognition

Posted on:2020-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2428330575998947Subject:Engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition is a biometric recognition technique that identifies a speaker's identity by extracting unique features containing the speaker's information.In the background of the fast development of deep learning and neural networks,speaker recognition continues to achieve new breakthroughs in performance.When training deep neural networks for speaker recognition,a loss function plays an important role in the convergence of the neural network.Triplet Loss cannot combine speakers' embedding features into batch matrices for training.Generalized End-to-End Loss requires large-scale audios to converge.Angular SoftMax Loss generates results which are not good when aggregating embeddings within the same speaker.Among the speaker recognition methods,the dominant baseline system is the standard i-vector system based on Kaldi AISHELL V1 recipe,which has effective audio feature expression ability and good recognition performance.Based on the above research foundations,we study and optimize the loss functions targeting the shortcomings of Triplet Loss,GE2E and Angular SoftMax Loss in the multi-speaker classification process.An improved method of combining two loss functions is proposed.We propose a new weight value and readjust the weights based on the distance adjustment of embeddings within the same speaker by using the weighted adjustment strategy.The feasibility of the improved loss function is proved by testing the system performance using the new model.Experiments are carried out on a large-scale dataset larger than 1000 hours,and experimental results show that,compared with the loss functions of the original neural network,the improved loss function makes the speaker recognition system achieve a relative 64%improvement.The best equal error rate reaches 0.01 under this large-scale dataset.
Keywords/Search Tags:Speaker recognition, Deep learning, Neural network, Loss function
PDF Full Text Request
Related items