Font Size: a A A

Research On Speaker Recognition Method Based On RNN In Noisy Environment

Posted on:2020-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:X HanFull Text:PDF
GTID:2428330575491199Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the advent of the information age and the development of network communication,the protection of information security has become an important issue that human beings need to face.The use of personal biological characteristics for identification has become an important means of ensuring information security.Speaker recognition means voiceprint recognition,it is a way to verify the identity of a speaker through the personality characteristics of the voice.Everyone's voice features are extremely difficult to be imitated,and speaker recognition has a higher recognition rate for identification of the identified objects.Compared with biometric identification,such as fingerprint,face and DNA,speaker recognition is more convenient,and the cost required for recognition is lower,which has been widely recognized by scholars both inside and outside.In this paper,we focus on how to improve the quality of speech signals,improve the accuracy of feature parameters,and the low recognition rate of speaker recognition system in noisy environment.The preprocessing process of speech signal is studied.The double-threshold endpoint detection method is used to remove the sound end unrelated to speaker recognition,and the signal-to-noise ratio of speech signal is improved by spectral subtraction.The extraction methods of Linear Prediction Cepstrum Coefficients(LPCC),Mel Frequency Cepstral Coefficents(MFCC)and Gammatone Frequency Cepstral Coefficents(GFCC)are described.The three feature parameters of speech signal are extracted,trained and recognized in GMM.The corresponding speaker recognition rate is obtained,and the relationship between speaker recognition rate and GMM mixture number is analyzed.The research shows that GFCC is more suitable for GMM model than LPCC and MFCC characteristic parameters,and the recognition rate is higher.When the number of mixed numbers is 50,the recognition rate of the three characteristic parameters is the highest.The research on the Recurrent Neural Networks(RNN)shows that the modelhas the defects of low information utilization and easy to appear dead neurons.Thus,on the basis of the original model,the number of hidden layers of RNN is increased,and the activation function of this layer is changed from traditional Sigmoid to Leaky ReLU.The first and last groups of data in the input layer are zeroed to enhance the effective utilization of data.An improved Denoise Recurrent Neural Network(DRNN)with fast calculation speed,good convergence and high recognition rate is constructed.According to this model,the random semantic speech signal with a sampling rate of 6 kHz and a duration of 2 seconds in the speech library is studied.The experimentally set the signal-to-noise ratio is-10 dB,-5dB,0dB,5dB,10 dB,15dB,20 dB,25dB.In a noisy environment,the improved model is used to denoise MFCC and GFCC,and the influence of traditional model and improved model on speech recognition rate is analyzed.Experimental results show that compared with the traditional speech recognition model,the improved DRNN model has higher speaker recognition rate with a maximum increase of 40%.The improved DRNN model is used for speech recognition with the signal-to-noise ratio of speech signal.The recognition rate is gradually increased,and the speech recognition rate is up to93%.It can be seen that using the improved DRNN model to identify the noisy speech signal can effectively remove the noise of the feature parameters and improve the speech recognition rate.It is suitable for speaker recognition in the case of different background noise in actual engineering.
Keywords/Search Tags:speaker recognition, GFCC, GMM, DRNN
PDF Full Text Request
Related items