Font Size: a A A

Research Of Speaker Recognition Based On Neural Network

Posted on:2020-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z X QiuFull Text:PDF
GTID:2428330596975541Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,identity authentication plays an important part in various scenarios.But the intellectualization brings convenience and hidden danger at the same time.Speaker recognition has become one of research hotspots with widely development space,and has great commercial value because of its reliability,security and economic convenience.Neural network technology has made rapid progress.As the research boom of Deep learning continues to rise,the prospects of application of deep neural network in various engineering fields are more and more broad.The application of neural network in voiceprint recognition greatly improves the accuracy,but the research time of voiceprint recognition is relatively short,there are still many problems to be solved.Most of the existing voiceprint recognition's text content of speech needs to be the same,that is,text-dependent speaker recognition,but in practical,text-independent recognition is more widely used.voiceprint recognition needs a large number of target speaker voice data,if the data is less,the model will be inadequate and the accuracy will be greatly reduced.when extracting the speaker template,it will use Random sample selection method,but random selection will bring errors due to the influence of noise.In addition,speech's speed has a great impact on the accuracy of existing voiceprint recognition system,but there is no effective method for this phenomenon.In this thesis,a Text-independent Speaker Verification System Based on deep neural network is studied.The MFCC is used as the speech feature,and a voiceprint recognition system based on full-connected deep neural network(DNN)is built as the baseline system.In order to solve the problem that the error rate increases greatly when the target speaker data is insufficient,this thesis improves the baseline system,and finally improves the accuracy of the improved model by nearly 10%.Then,the training method of baseline system is improved by using the principle of transfer learning.In order to solve the error caused by the randomness of speaker template selection,k-means algorithm is used to select the speaker template,which reduces the error caused by the randomness of template selection.On this basis,this thesis discusses the effect of different frames as neural network's input on the results,and further improves the accuracy by adding the steps of voting judgment for multiple similarity results.Finally,because the speech of the same speaker with different speech's speed can hardly be recognized correctly in Text-Independent Speaker recognition,this thesis implements a hybrid model of deep belief network and deep neural network(DBN-DNN).The learning objective of the neural network is changed from classification to judging whether two speech templates are the same speaker.This improves the randomness of extracting speaker templates and makes it easy to have them.In addition,different speech speeds are added to the training,which further improves the recognition accuracy at least7% percent and robustness of the model for the same speaker at different speech speeds.
Keywords/Search Tags:speaker recognition, feature extraction, mel frequency cepstrum coefficient, artificial neural network
PDF Full Text Request
Related items