Font Size: a A A

Research On Speaker Recognition Based On Deep Neutral Network

Posted on:2021-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhangFull Text:PDF
GTID:2518306497965569Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of Deep neural network(DNN),DNN has a wide influence on the field of machine learning and pattern recognition.DNN has been successfully used in image recognition and automatic speech recognition(ASR).People are inspired to explore other aspects of DNN and acquire the same functionality.However,if we want to use DNN in a new field,new knowledge will be needed.In this thesis,we focus on DNN-based speaker recognition tasks.Speaker recognition is also called sound pattern recognition technology.The speaker's identity is determined by the voice.In speaker recognition,the Gaussian Hybrid Model(GMM)framework is traditionally used which based on Mel Frequency Cepstrum Coefficient(MFCC).And each speaker model is modeled on GMM.It is modeled according to the short-term stability of the voice signal,so it is a generative model.However,it cannot be a good representation of the speaker's characteristics.Neural networks are one of the best choices in order to directly consider the identification characteristics and models of speakers.Especially with the development of DNN technology in recent years,its feature extraction and pattern recognition have been paid more and more attention.Based on the above background,this thesis studies a DNN-based speaker recognition system.Using this method,it can change the characteristics of the extracted speaker compared with the traditional method.The main research work is as follows.In this thesis,we introduce the pre-processing steps,methods and significance of the voice signal.MFCC and FBank features of voice are explained how to obtain,and compared which one is better.Then the specific estimation method of the speaker identification model is introduced.The EM algorithm estimation Gaussian hybrid model is introduced,and the common background model is used to estimate the common background model by MAP algorithm.This thesis also introduces the I-Vector model which has been widely used.Finally,the traditional I-Vector model is improved,and an I-Vector model based on DNN is proposed.Because neural network has the ability to learn from abstract data,it is used to extract the characteristics of the speaker.The theoretical basis of this method and the steps and methods of improving the previous model are described in detail.A new estimation algorithm is used instead of the maximum post-mortem probability to establish the speaker model.Through analyzing the effect of different activation functions on the speaker model,the most suitable activation function is found and used.To solve the problem of overfitting,the dropout layer is added to the neural network.In view of the difficult matrix estimation in the I-vector model,a new method is proposed to replace and demonstrate it in concrete terms.Finally,a speaker model based on deep neural network is trained.And through the specific experiments,the new model is practical compared with the traditional I-Vector model and GMM-UBM model.The voice library is mainly based on Timit and self-built voice library which is used in the experiment.In our experiments,we try to test the feasibility of DNN-based speaker recognition system in several ways.We use different sizes of voice,different length of voice,different genders of voice and other methods to test the system.Then the traditional GMM-UBM and I-Vector models are compared to see if the new method can improve the success rate of the speaker's recognition.Finally,the recognition rate of this method and the robustness of noise are analyzed,under different noise backgrounds.
Keywords/Search Tags:speaker verification, I-Vector, DNN, GMM-UBM
PDF Full Text Request
Related items