Font Size: a A A

Speaker Recognition Algorithm Based On Deep Learning

Posted on:2021-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhengFull Text:PDF
GTID:2428330602478820Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition,also known as voiceprint recognition,which is a technology to judge speaker identity based on voiceprint characteristics.Speaker recognition is widely used in various fields and has practical research value.With the improvement of computer hardware performance,voiceprint recognition technology based on deep learning has become one of the mainstream methods.However,in deep learning tasks,it is often to learn a single speaker classifier model to predict labels,or use a simple similarity decision method to achieve model matching,which leads to insufficient discriminative ability of the voiceprint features finally trained.In this paper,in order to extract the voiceprint features with strong discriminating ability,by improving the traditional loss function,the network model trained by the improved loss function supervision can effectively improve the speaker recognition accuracy.The content of this article is as follows:1.First,the low-dimensional features of the speaker are extracted through the last hidden layer of the dense network(DenseNet),and then the proposed ICTL loss function is used as the target function of the last hidden layer of DenseNet,ICTL is composed of triplet loss and improved triplet loss(ICL),it is responsible for calculating the similarity loss between the triplet features extracted in the last hidden layer,then use Softmax Loss to calculate the error loss between the predicted identity and the true identity of the triplet sample corresponding to the last classification layer of DenseNet,where ICTL is the auxiliary loss functions of Softmax Loss,through the supervision of ICTL,The dimensions of the voiceprint features output by the last hidden layer have a highly correlated distribution,that is,the same speaker samples are close to each other,and the different speaker samples are far away from each other,when the sample features of the triplet pass through the last classification layer of DenseNet,the speaker recognition effect will be greatly improved.2.DenseNet is still used as the voiceprint feature extractor,and extract the voiceprint features of the last hidden layer.Introduced the idea of Triplet Center Loss(TCL),and improve it on the basis of TCL,proposed two TCLs with added intra-class constraints as the supervision function of the last hidden layer of DenseNet,in order to further enhance the constraint of the similarity between the extracted voiceprint features and the feature center of the sample samples belonging to the same speaker during the training process,the discrimination ability of the voiceprint features is improved,and the recognition effect of the DenseNet classification layer is improved.
Keywords/Search Tags:Speaker recognition, Dense network, Triplet loss, Triplet center loss
PDF Full Text Request
Related items