Font Size: a A A

Research On Speaker Identification Based On Deep Learning

Posted on:2019-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:T T ChenFull Text:PDF
GTID:2348330542498849Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,speaker-recognition has made new progress in the context of deep learning.At present,the speaker recognition system based on identity vector(I-vector)has become the state of the art method in speaker recognition system.This paper focuses on how to use deep learning to eliminate the discrepancy gap between human and computer,how to narrow the auditory characteristics that can be learned by the computer and what human beings can perceive from the needs of the system in real life and work,and to achiche the goal,which the computer's judgements are the same as the human's.Based on the views above,this paper focuses on the research of speaker recognition on the following aspects:(1)study and improve the algorithm of the identity vector on speaker identificationStudy on the basic theory and key technologies of I-vector,design and implete the speaker recognition system based on I-vector as the baseline.Analyse the advantages and disadvantages of the I-vector and optimize the process of the feature extraction;(2)study on the speaker identification based on the long short term memory neural network model and the deep belief neural network modelBuild long short term memory neural network model and the deep belief neural network model,create a new node matching algorithm,then train the model a few times,do the research of speaker recognition,set the optimal state of the depth of deep belif nueral network parameters and the number of each layer of the network input and output nodes,investigate the influence of the combinations of the different layers of the deep belif network and different kinds of features;(3)Research on Speaker Recognition Method Based on Spectral Plot and CNNThe vocalograms of different speech segments are unified to the same size through certain sampling,eliminate the problem of different audio sequences of different lengths,and use it as the input of CNN's VGG network and residual network to optimally set the two network structures.Layers and node settings were used for speaker recognition experiments to investigate whether the performance of the speaker recognition system could be improved under the CNN network.In addition,network convergence attempts are made to increase the single-layer DNN network and verify whether the performance is improved.The experimental results are compared between different models and features,investigate what the optimal feature parameters for the speaker recognition system and which neural network structure is most suitable for speaker recognition.
Keywords/Search Tags:feature extraction, deep learning, joint factor analysis, identity vector, deep belif neural network, long short term memory, spectral plot, convolution neural network, residual network
PDF Full Text Request
Related items