Font Size: a A A

Research On Speaker Recognition Based On Deep Neural Network

Posted on:2021-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:C Y SunFull Text:PDF
GTID:2428330614456382Subject:Bionic Equipment and Control Engineering
Abstract/Summary:PDF Full Text Request
In recent years,voice recognition technology has been continuously developed and more and more applications have been applied.Speaker recognition technology has also received a lot of attention as an important identity authentication method.The researchers used deep learning for speaker recognition and achieved remarkable research results.The main purpose of this article is to improve the speaker recognition rate in closed sets that are not related to text.Based on the deep neural network,we will study speaker recognition.A large number of experiments have proved that for speaker recognition based on deep learning,the advantages and disadvantages of the speaker's feature parameters and acoustic models seriously affect the quality of the recognition system,so the main work of this article is to preprocess the feature parameter The window function is improved and the existing acoustic model is optimized for training and testing.Experiments show that the speaker recognition accuracy of the improved speaker recognition system has been effectively improved,which proves that the method used in this paper is valuable and has reference significance for future research work.This article first introduces the overall framework of speaker recognition,as well as the specific extraction process of three feature parameters often used in speaker recognition and compares their advantages and disadvantages.By analyzing the process of extracting MFCC coefficients,in order to make the feature parameters contain more speaker voice information,the key step of speech windowing,that is,the Hamming window used is proposed,and the mathematical analysis proves that the newly designed window The main significance of the function for extracting voice MFCC feature parameters on the basis of the original Hamming window is to increase the feature information such as the slope and phase of the voice power spectrum.Experiments show that the improved voice feature parameters can effectively improve the efficiency of later training and thus improve the accuracy of speaker recognition.Then,the shortcomings of the gated recurrent unit neural network are analyzed,and a deep bidirectional gated recurrent unit(BiGRUs)neural network is proposed as an acoustic model for speaker recognition.In order to solve the problem of gradient disappearance and overfitting in BiGRUs,this paper combines Maxout network and Dropout regularization algorithm to improve the BiGRUs acoustic model,and proposes the BiGRUs-DM acoustic model.Experimental results show that the improved BiGRUs-DM speaker recognition model in this paper is superior to BiGRUs,Bi LSTMs and other models,which can effectively improve speaker recognition performance.Finally,the improved speaker recognition system will be tested and analyzed in the THCHS-30 Chinese corpus and the self-made corpus in this paper.Experimental results show that the speaker recognition system established in this paper has stronger generalization ability and higher recognition rate than the traditional speaker recognition system based on RNN.
Keywords/Search Tags:Speaker Recognition, Deep Neural Network, Recurrent Neural Network, MFCC, BiGRUs, Maxout, Dropout
PDF Full Text Request
Related items