Speaker Recognition Algorithm Based On Frequency Band Attention And Multi-metric Learning

Posted on:2022-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:C Huang

Full Text:PDF

GTID:2518306539980599

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Speaker recognition is a biometric technology that recognizes the identity of speaker based on the personality features in the voice signal of speaker.The performance of the speaker recognition system can be improved in multiple directions.For example,select sufficiently important features from the voice features of speaker to improve the distinguishability of the features;you can also use a loss function with better performance to train the feature extraction network.This paper uses the attention model and multi-metric learning to improve the performance of the speaker recognition system.The major work of this paper is as follows:1)A frequency band attention model is proposed to recalibrate the voice features of speaker.First,one-dimensional convolution is used to gather the information of different frame features on the same frequency band,and the one-dimensional convolution operation makes different frame features be given different weights.Therefore,the role of the frequency band in which the individual features are concentrated is enlarged,and the role of the frequency band in which the individual features are not concentrated is reduced.Then,two fully connected layers with nonlinear operations are used to reduce and restore the dimension of the feature respectively.In this way,information gathered in different frequency bands can be associated with a set of frequency band weights.Then input the redefined speaker features into the residual network with squeeze excitation network.The residual network is used to extract the deep features of the speaker,where the squeeze excitation network can obtain the channel correlation between different channel feature maps in the convolutional layer.2)A multi-metric learning method is proposed to train the discriminative feature extraction network.First,input the extracted speaker features into the residual network with squeeze excitation network.Then three kinds of loss functions are used to form a multi-metric learning loss function to train the network.The cross-entropy loss function is used to speed up the training of the network;the triplet center loss function is used to reduce the distance between the same samples and increase the distance between different samples;the additive margin softmax loss function is used to increase the class spacing of samples indiscriminately.

Keywords/Search Tags:

Speaker recognition, Frequency band attention, Residual network, Multi-metric learning

PDF Full Text Request

Related items

1	Research On Text-independent Speaker Recognition Based On Attention Mechanism
2	Weighted Pairwise Constraints Metric Learning Algorithm In Speaker Recognition
3	Multi-scale 3D Residual Attention Network For Facial Expression Recognition
4	Text Independent Speaker Recognition Based On Deep Learning Framework
5	Research Of Speaker Recognition Technology Based On Kaldi
6	Speaker Recognition Algorithm Based On Residual Network
7	Research On Deep Learning Based Speaker Recognition Modeling
8	Research On Face Recognition Method Under Unconstrained Condition
9	Research On Distance And Similarity Metric Learning For Speaker Recognition
10	Speaker Recognition Research Based On GMM Speaker Clustering Technology