Font Size: a A A

Speaker Recognition Based On Multi-domain Analysis

Posted on:2022-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:W CaoFull Text:PDF
GTID:2518306554452454Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition is a technology that uses voice to authenticate identity.It has important application value in national security and business,so it is of great significance to study and implement it.Under the ideal experimental conditions,the voice data can have high signal-to-noise ratio and unified channel conditions,so that the speaker recognition system has high performance.However,in real scenarios,the collected voice is easy to mix with the environmental noise,and is affected by different degrees of channel interference factors,so the recognition performance of the system is difficult to guarantee.Therefore,how to weaken the negative effects of environmental noise and channel interference and improve the robustness of speaker recognition systems in complex environment is one of the important research goals in this field.At present,the research on the robustness of speaker recognition system is mainly carried out from three aspects: feature domain,model domain and score domain.In the feature domain,the main goal is to obtain representative and noise-resistant features from speech.In the model domain,the main goal is to train a robust speaker matching model,and solve the channel mismatch problem between the registered model and the test voice by channel compensation method.In the score domain,the study focuses on the distribution of scores,and makes the matching scores as little as possible affected by environmental noise and channel interference through score normalization.The above multi-domain research on the speaker recognition system has made more progress,but many existing algorithms still have limitations.In the face of complex application environment,it is necessary to make up for the deficiencies of the current algorithms and further improve the robustness of the speaker recognition system.This paper focuses on the robust algorithm of speaker recognition from the feature domain and score domain.The main work and innovation are as follows:(1)Speaker recognition systems based on GMM-UBM,GMM-SVM and IVector-PLDA models were constructed respectively,and on this basis,researches on feature domain and score domain were carried out.(2)In the feature domain,this paper focuses on the Power-Normalized Cepstral Coefficients(PNCC),which is relatively robust in noise environment,and improves it by adding dynamic difference parameters,and uses Cepstral Mean Subtraction(CMS)and Cepstral Variance Normalization(CVN)to compensate the features.In noise environment,the improved PNCC has better performance than MFCC and original PNCC.(3)In the score domain,this paper studies the existing score normalization methods,and in view of their limitations,proposes the Log-likelihood Normalization(LLN)algorithm,which can effectively expand the gap between the scores of target speakers and non-target speakers,which is conducive to improving the decision performance of the system under the unified threshold.It can be performed at the end of the system without using the pre-developed set,which has good universality for a variety of systems.(4)The scores of multiple systems composed of MFCC,improved PNCC features,GMM-SVM and IVector-PLDA models are fused,and the LLN method is also used to normalize the scores.In this way,the information that characterizes the speaker is effectively used,and the advantages of the multi-domain method are combined.Experiments show that the score fusion of multiple systems is better than a single speaker recognition system,and its performance is further improved after adding LLN method.
Keywords/Search Tags:Speaker Recognition, Improved Power-normalized Cepstral Coefficients, Log-likelihood Normalization, Probability Linear Discriminant Dnalysis, Score Fusion
PDF Full Text Request
Related items