Research On Text-Independent Speaker Recognition Method Based On Deep Learning

Posted on:2024-06-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y R Liu

Full Text:PDF

GTID:2568307151966989

Subject:Communication Engineering (including broadband network, mobile communication, etc.) (Professional Degree)

Abstract/Summary:

PDF Full Text Request

Speaker recognition technology uses the differences in speaker speech characteristics to distinguish the identities of different speakers.With the advent of the 5G era,this technology has been widely used in mobile phone voice assistants,smart homes,and other aspects.With the wide application of artificial intelligence technology,the direction of speaker recognition based on deep learning has become a hot research direction.However,some issues such as how to improve network adaptability and explore the compatibility of network performance and network parameters are still worth in-depth discussion and research.This article analyzes and studies text independent speaker recognition methods based on deep learning,and the specific research content is as follows:First,in order to improve the network’s ability to adapt to speech frame-level features and aggregate global information,an adaptive multi-scale speaker recognition method based on multi-head attention pooling is designed.The network integrates one-dimensional selective kernel module,Res2 Net and channel attention mechanism to design an intra-layer adaptive multi-scale module to extract more representative speaker features.The application of this module realizes the fusion of multiple receptive fields in the feature fusion stage.Finally,multi-head attention pooling is used to extract discourse level speaker features,which effectively improves the performance of speaker recognition system.Secondly,in order to enable the network to aggregate both global information and adjacent context information,a speaker recognition model based on Context Transformer is designed.The Cot Transformer module is added to the residual network,which effectively improves the network performance.In addition,the input spectrum is enhanced by different methods before input into the network,which is more conducive to network training.In the aspect of network loss function,the prototype loss function based on meta-learning and global classification loss function based on softmax are selected to train the network together.Different comparative experimental results show that the network can improve the recognition accuracy of the system.Finally,in order to balance the depth of the network and the amount of parameters,a method of depth inverse residual speaker recognition based on dynamic convolution in time domain is designed.A dynamic depth inverse residual bottleneck block is designed,which has relatively small network model parameters and storage space when the network depth increases.The time domain dynamic convolution which is more suitable for speaker recognition is added to it.The network structure is designed with three different depth speaker recognition networks,and the time domain dynamic convolution with different number of aggregate convolution cores is added.Finally,the effectiveness of the method in this chapter is proved by various ablation comparative experiments.

Keywords/Search Tags:

speaker recognition, deep learning, selective kernel model, data enhancement, dynamic convolution

PDF Full Text Request

Related items

1	Study On Speaker Recognition Based On Deep Learning
2	Deep Learning Analysis On Multi-domain Speaker Recognition
3	Research On Speaker Recognition Based On SVM And Deep Learning
4	Research On Universal Background Model And Preliminary Study On Deep Learning In Speaker Recognition
5	Research On Key Technologies Of Speaker Recognition Based On Deep Learning
6	Research On Speaker Recognition Based On Discriminative Feature Learning
7	Research On Deep Learning Models And Algorithms For Speaker Recognition
8	Research Of Robust Speaker Recognition In Deep Learning Framework
9	Research On Dynamic Gesture Recognition Algorithm Based On 3D Convolution
10	Research On Speaker Recognition Algorithm Based On Frequency Domain Feature Enhancement And Difficult Sample Mining