Font Size: a A A

Research On Speaker Recognition Method Based On Deep Learning

Posted on:2022-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:M Y XiongFull Text:PDF
GTID:2518306539991859Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition,also known as voiceprint recognition,it refers to the identification of a speaker by using the unique voice characteristics of a person.As one of the effective biometric technologies,it is applied in many areas of real life.In this paper,we focus on the method of speaker recognition based on deep learning,and study the speaker recognition system from two aspects: the establishment of model and the process of feature extraction.The main contents of this dissertation are summarized as follows:1.The speaker recognition method based on the CNN-TDNN hybrid model is investigated.First,a Convolutional Neural Network(CNN)is used to learn the local spatial features of the log-mel Filter Bank(FBank)of speech.Then,the dynamic timedomain changes in the speech signal is modeled by Time-delay Neural Networks(TDNN)to capture the long-term dependence of the speech signal,and obtain the pronunciation habits of the speaker.Finally,Softmax function is used to map the output of the neural network to the probability space and identify the speaker according to the output probability.The experimental results show that the performance of the proposed speaker recognition system based on the CNN-TDNN hybrid model is superior the Xvector system based on a single TDNN.2.Research on the robust speaker recognition method based on adaptive feature mapping.In response to the problem that the actual captured speech usually contains noise of different intensity,Gaussian Filter Spectrums(GFSs)are obtained by employing multi-scale Gaussian Filter filters to smooth the noise of the spectrum.Then group convolution is applied to realize the convolution of multi-scale GFSs respectively,and the feature in the same position of the obtained feature graphs are mapped to the maximum feature space.Finally,the real-time update of network weights is used to adaptively suppress the noise in the spectrum with different noise intensity,so that the network can extract more robust speaker features,therefore,the characteristics of speakers that are more robust can be extracted by the network.Experimental results show that adaptive feature mapping effectively improves the accuracy and robustness of the speaker recognition system.
Keywords/Search Tags:Speaker recognition, Convolutional neural network, Time-delay neural network, Gaussian filtering, Adaptive feature mapping
PDF Full Text Request
Related items