Font Size: a A A

Research And Implementation Of High Recognition Rate Voiceprint Recognition Technology Based On Convolutional Neural Network

Posted on:2022-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q MaFull Text:PDF
GTID:2518306764477474Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Voiceprint is unique.Voiceprint recognition technology is a hot and cutting-edge research technology in the field of biological authentication.This thesis studies the text independent voiceprint recognition technology,uses the residual convolution neural network to extract the voiceprint features,and adds the attention mechanism to improve the recognition effect.At the same time.This thesis also studies the influence of different loss functions on the field of voiceprint recognition,and puts forward an AP loss function suitable for application in this field.The thesis adopts the data augmentation strategy to further reduce the equal error rate(EER)index of voiceprint recognition system.The main work and contributions of the thesis are summarized as follows:(1)Design and improve the voiceprint recognition feature extraction module based on the original ResNet-34 model,the attention mechanism of Convolutional Block Attention Module(CBAM)and self attention pooling(SAP)coding layer are introduced into the model,and propose FRACNN2 D neural network model.Compared with the x-vector model on the VoxCeleb1 dataset,the EER index of this model is reduced by 2.01%,and the MinDCF is reduced by 0.33,and the ROC-AUC is 0.85% lower.Compared with the ResNet-34 model training on the VoxCeleb2 dataset,the EER index is reduced by 1.72%.(2)Study the influence of different loss functions on the field of voiceprint recognition,including traditional Softmax loss function and its improved AM-Softmax,Arcface loss function in different coefficients m and hyperparameters s.The performance of the GE2 E and Prototypical loss functions based on deep metric learning are also analyzed and tested,and improve the original Prototypical loss function and propose an AP loss function.The scale invariance and rotation invariance are introduced by using the cosine metric,and finally the EER index is applied to different loss functions.For performance analysis,the best performing AP loss function has a lower EER index on the VoxCeleb1 dataset up 4.57%.(3)Two types of data augmentation strategies are adopted for the FRACNN2D-AP neural network model proposed in this thesis.The first strategy is to add additive noise,including Music,Bubble,RIR,etc.,and the second is to use Spec Augment strategy for data enhancement,including Time Warp,LB,LD three strategies,two types of data augmentation strategies reduce the EER index by 0.11% and 0.19% respectively.
Keywords/Search Tags:voiceprint recognition, convolutional neural network, attention mechanism, FRACNN2D, data augmentation
PDF Full Text Request
Related items