Font Size: a A A

Research On Speaker Recognition In Distracting Environments

Posted on:2021-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:M L YangFull Text:PDF
GTID:2518306200953099Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition is a technology that uses speech to identify the identity of the speaker..In recent years,speaker recognition methods based on i-vector and x-vector have been developed,but both of them are independent of background noise,and the influence of interference environment on speaker recognition performance is not fully considered,which leads to poor speaker recognition performance in all kinds of practical application scenarios.Although the classical speech enhancement methods,such as spectral subtraction,have been effectively applied in speech recognition and language recognition,the performance of speaker recognition is different from it.While suppressing the background noise,it also causes great damage to the acoustic structure of the speaker's speech,resulting in the unsatisfactory performance of the noise-suppressed speaker recognition.Therefore,this paper focuses on the robustness of speaker recognition in interference environment.Firstly,the speaker recognition performance of three different speech features,filter bank coefficient(Fbank),Mel frequency cepstrum coefficient(MFCC)and perceptual linear prediction coefficient(PLP),on i-vector and x-vector speaker recognition models are studied,and the speaker features are screened.Secondly,in order to solve the problem that the traditional spectral subtraction not only suppresses the background noise,but also destroys the acoustic structure of speaker speech,which restricts the performance of speaker recognition,the construction of deep neural network(DNN)speech enhancement is proposed as the pre-processing unit of speaker recognition to reduce the influence of interference environment on speaker recognition.Finally,in order to make up for the distortion of speaker speech caused by speech enhancement,a generative adversarial network(GAN)is constructed on the basis of DNN speech enhancement network as the pre-processing unit of speaker recognition to expand the data of registered speaker to enhance the identity feature vector of registered speaker.finally,a speaker recognition model based on DNN denoising and identity vector enhancement is obtained.The performance of speaker recognition in interference environment is further improved.The test results of speaker recognition in multi-type interference environmentshow that under the condition of noisy noise,the average performance index of speaker recognition based on DNN denoising and identity vector enhancement is improved by 61.92% and 20.32% respectively compared with the speaker recognition equal error rate(EER)and minimum detection cost function(Mindcf16)of x-vector baseline model.Under the condition of factory noise interference,the average performance index of EER and Mindcf16 of the proposed method is improved by48.15% and 11.45% respectively compared with the baseline model.Under the condition of music noise interference,the average performance index of EER and Mindcf16 of the proposed method is improved by 55.00% and 18.21%,respectively.Under the condition of traffic noise,the average performance index of EER and Mindcf16 of the proposed method is improved by 56.46% and 20.69% respectively compared with the baseline model.To sum up,the algorithm model proposed in this paper significantly improves the performance of speaker recognition in interference environment.
Keywords/Search Tags:speaker recognition, DNN denoising, generative adversarial network, i-vector, x-vector
PDF Full Text Request
Related items