Font Size: a A A

Speaker Recognition Algorithm Based On Frequency Band Attention And Multi-metric Learning

Posted on:2022-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:C HuangFull Text:PDF
GTID:2518306539980599Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition is a biometric technology that recognizes the identity of speaker based on the personality features in the voice signal of speaker.The performance of the speaker recognition system can be improved in multiple directions.For example,select sufficiently important features from the voice features of speaker to improve the distinguishability of the features;you can also use a loss function with better performance to train the feature extraction network.This paper uses the attention model and multi-metric learning to improve the performance of the speaker recognition system.The major work of this paper is as follows:1)A frequency band attention model is proposed to recalibrate the voice features of speaker.First,one-dimensional convolution is used to gather the information of different frame features on the same frequency band,and the one-dimensional convolution operation makes different frame features be given different weights.Therefore,the role of the frequency band in which the individual features are concentrated is enlarged,and the role of the frequency band in which the individual features are not concentrated is reduced.Then,two fully connected layers with nonlinear operations are used to reduce and restore the dimension of the feature respectively.In this way,information gathered in different frequency bands can be associated with a set of frequency band weights.Then input the redefined speaker features into the residual network with squeeze excitation network.The residual network is used to extract the deep features of the speaker,where the squeeze excitation network can obtain the channel correlation between different channel feature maps in the convolutional layer.2)A multi-metric learning method is proposed to train the discriminative feature extraction network.First,input the extracted speaker features into the residual network with squeeze excitation network.Then three kinds of loss functions are used to form a multi-metric learning loss function to train the network.The cross-entropy loss function is used to speed up the training of the network;the triplet center loss function is used to reduce the distance between the same samples and increase the distance between different samples;the additive margin softmax loss function is used to increase the class spacing of samples indiscriminately.
Keywords/Search Tags:Speaker recognition, Frequency band attention, Residual network, Multi-metric learning
PDF Full Text Request
Related items