Study On Speaker Recognition Based On Deep Learning

Posted on:2023-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:Q P Xu

Full Text:PDF

GTID:2568306830480134

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Speaker recognition is the task of identifying persons from their voices by virtual of speech signal processing method,and it has a broad application prospect in network communication,consumer electronics,intelligent terminal,human-computer interaction,secure payment and other fields.There are three major branches of speaker recognition:speaker identification,speaker verification and speaker diarization.Speaker recognition can also be divided into text-independent and text-dependent according to text information.In recent years,with the development of deep learning theory,speaker recognition technology based on deep neural network has made new progress,but its large amount of parameters and computation cost limit its application in embedded systems.In addition,its recognition performance still needs to be improved.In this thesis,deep learning theory is applied to study the text independent speaker recognition method based on deep learning theory.The main work is as follows:(1)To reduce the requirement of high computational resources for existing methods,an efficient speaker recognition model based on TSCA-Res MBConv structure is proposed.In this model,the number of parameters is greatly reduced by introducing Fused MBConv and MBConv structures.The TSCA module is proposed in consideration of the fact that speakers may produce more characteristic sounds during certain periods of time than other periods of time.Compared with SE module,TSCA can establish the association between channel information and time segment,thus improving the recognition performance of the model.Experimental results show that the TSCA-Res MBConv structure proposed in this thesis can achieve better recognition performance than the benchmark method with fewer parameters.(2)Considering the strong correlation between speaker verification and speaker identification,as well as the characteristics of time pooling layer in the general structure of speaker recognition network,a time pooling method combined with multi-task learning scheme is proposed to improve recognition performance.In this method,multi-task learning strategy is applied to make the query vector of self-attention mechanism adopted by time pooling layer have more effective information.In the process of model training,the Triplet considering sample pairs and AAM-Softmax considering category information are combined to construct a network model containing speaker verification task and speaker identification task.Experiments show that this scheme can improve the recognition performance of speaker verification task effectively.

Keywords/Search Tags:

Speaker Recognition, Speaker Verification, Attention Mechanism, Depthwise Separable Convolution, Multi-task Learning, Metric Learning

PDF Full Text Request

Related items

1	Text Independent Speaker Recognition Based On Deep Learning Framework
2	Research On Speaker Recognition Method Based On Multi-Task Learning
3	Speaker Extraction And Verification Based On Deep Learning
4	Research On Communication Signal Modulation Recognition Based On Deep Learning
5	Research On Multi-task Learning Based Far-field Speaker Verification
6	Speaker Recognition Algorithm Based On Frequency Band Attention And Multi-metric Learning
7	Speaker Verification And Person Re-identification Based On Deep Metric Learning
8	Research Of Robust Speaker Verification Baesd On Deep Learning
9	Research On Voiceprint Verification Technology In Multi-speaker Scenarios Based On Deep Learning
10	Co-channel Speaker Recognition Based On Deep Learning