| Speaker recognition technology uses the differences in speaker speech characteristics to distinguish the identities of different speakers.With the advent of the 5G era,this technology has been widely used in mobile phone voice assistants,smart homes,and other aspects.With the wide application of artificial intelligence technology,the direction of speaker recognition based on deep learning has become a hot research direction.However,some issues such as how to improve network adaptability and explore the compatibility of network performance and network parameters are still worth in-depth discussion and research.This article analyzes and studies text independent speaker recognition methods based on deep learning,and the specific research content is as follows:First,in order to improve the network’s ability to adapt to speech frame-level features and aggregate global information,an adaptive multi-scale speaker recognition method based on multi-head attention pooling is designed.The network integrates one-dimensional selective kernel module,Res2 Net and channel attention mechanism to design an intra-layer adaptive multi-scale module to extract more representative speaker features.The application of this module realizes the fusion of multiple receptive fields in the feature fusion stage.Finally,multi-head attention pooling is used to extract discourse level speaker features,which effectively improves the performance of speaker recognition system.Secondly,in order to enable the network to aggregate both global information and adjacent context information,a speaker recognition model based on Context Transformer is designed.The Cot Transformer module is added to the residual network,which effectively improves the network performance.In addition,the input spectrum is enhanced by different methods before input into the network,which is more conducive to network training.In the aspect of network loss function,the prototype loss function based on meta-learning and global classification loss function based on softmax are selected to train the network together.Different comparative experimental results show that the network can improve the recognition accuracy of the system.Finally,in order to balance the depth of the network and the amount of parameters,a method of depth inverse residual speaker recognition based on dynamic convolution in time domain is designed.A dynamic depth inverse residual bottleneck block is designed,which has relatively small network model parameters and storage space when the network depth increases.The time domain dynamic convolution which is more suitable for speaker recognition is added to it.The network structure is designed with three different depth speaker recognition networks,and the time domain dynamic convolution with different number of aggregate convolution cores is added.Finally,the effectiveness of the method in this chapter is proved by various ablation comparative experiments. |