Font Size: a A A

Speaker Recognition System Based On Deep Learning

Posted on:2019-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:J D ZhangFull Text:PDF
GTID:2428330545964167Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of speech recognition technology,speaker recognition has received more and more attention as an important method of identity authentication.Speaker recognition,also known as voiceprint recognition,identifies the speaker by extracting features that characterize the speaker's identity from the speech signal.As a biometric authentication technology,speaker recognition has important research value and broad research prospects.As the speech recognition technology has made great progress under the influence of deep learning,the speaker recognition technology is deeply affected.More and more researchers have shifted their research from traditional methods based on probability statistics to deep learning methods.Inspired by the end-to-end model,this paper uses deep neural network to extract the deep features of speakers,and improve the network under the condition of less training data.Establishing a speaker recognition system using time-delayed neural networks and PLDA back-end models.The improved network is composed of 8 hidden layers and a layer of pooling layer.The pooling layer aggregates the output of the preceding hidden layer over time and computes its average and standard deviation And accumulate these statistics as input for the next hidden layer.Use the output of the last hidden layer of the trained network as a speaker feature during speaker enrollment.In the test phase,extract the same characterization vector and average.and then use the PLDA model to score.Instead of using a single frame MFCC feature,splicing a feature at a certain step size as a network input to achieve long-term speech feature.Finally,compare the improved network model with the traditional i-vector method,EER reduction of 2.4% in noise datasets.In the gender-related test,the EER value decreased by0.8% in the female test data set.In the test data set including Chinese,the EER value decreased by 13.8%.
Keywords/Search Tags:Speaker recognition, TDNN, i-vector, MFCC, PLDA
PDF Full Text Request
Related items