Font Size: a A A

Research On Speaker Recognition Algorithm Based On Deep Convolutional Neural Network

Posted on:2022-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:M H QiFull Text:PDF
GTID:2518306524990499Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Biometrics recognition technology plays an increasingly important roles in various intelligent terminal authentication scenarios.Speaker recognition(SR)is one of the most popular technologies among the biometric recognition technologies.SR can be divided into speaker identification(SI)and speaker verification(SV)according different application scenarios.In addition,speaker recognition can be divided into two categories: text-dependent(TD)and text-independent(TI).This thesis focuses on more challenging text-independent speaker recognition(TI-SR).Deep convolutional neural network(CNN)is used to extract speaker features in this framework.Speaker identification and speaker verification experiments are carried out on two open-source datasets respectively.Finally,the technology of front end and backend separation is adopted to implement a Web-side speaker recognition system.The main work and contributions of this thesis are summarized as follows:Firstly,the framework of speaker recognition is designed and improved.This framework contains four steps of training,fine-tuning,registration and evaluation.In training step,the general background model is obtained.And the model is optimized in the fine-tuning stage.In registration stage,the registered speaker models are generated by the model.And the similarity score between the test utterance and the speaker models is calculated in the evaluation phase,and relevant decisions are made based on the score.Secondly,two deep convolutional neural network structures based on attention mechanism are proposed for feature extraction.The SE attention mechanism and CBAM attention mechanism are used to improve the residual block by different ways,furthermore,increasing the depth of the network by stacking the modified residual block,finally,the SECNN model and the Attentive CNN model are obtained in this thesis.The input of the models is spectrogram generated by preprocessing audio signal and the output of the models is the sentence embedding of one speaker.The SI experiment results show that the accuracy rate of SECNN and Attentive CNN models is95.15% and 95.31% on Librispeech dataset.The SV experiment results show that the error equal rate(EER)rate of SECNN and Attentive CNN models is 5.82% and 6.55%on Librispeech dataset.The performance of two proposed models in this thesis is better than the baseline model in the speaker identification and speaker verification experiments.Thirdly,the triplet loss is used in fine-tuning stage to improve the performance of SECNN model and Attentive CNN model,and two different triple sampling methods are used: random sampling triple method and optimized triple sampling method.And the two models are optimized by above methods with the goal of minimizing the triplet loss.The SV experiment results show that the SECNN and Attentive CNN models fine-tuned by the optimized triple sampling method reduce the EER rate by 2.26% and2.07% on TIMIT dataset and reduce the EER rate by 0.61% and 1.19% on the Librispeech dataset compared with the SECNN and Attentive CNN models without fine-tune.
Keywords/Search Tags:Speaker Recognition, Convolutional Neural Network, Attention Mechanism, Triplet Loss
PDF Full Text Request
Related items