Font Size: a A A

Training GMM-UBM Against SlicedWasserstein Distance For Speaker Recognition

Posted on:2020-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2428330596992268Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Speaker recognition technology,as an important research field of speech signal processing and an important biometric technology,is meanful for many applications such as financial fraud prevention,mobile payment,criminal investigation and identification in public security.Gaussian mixture model-global background model(GMM-UBM)is the most classical model in the research field of speaker recognition.In the GMM-UBM model,UBM is a high-order Gaussian mixture model that covers more speech features,so in the GMM-UBM model,the parameter estimation of the Gaussian mixture model is extremely important.Among them,the expectation maximization(EM)algorithm is the most commonly used parameter estimation method in the Gaussian mixture model.However,the EM algorithm can only guarantee that the likelihood function converges to a local extremum point.From the random starting point,the EM algorithm has a large possibility to converge to a poor local extremum point.Although the K-Means algorithm can alleviate this problem,the effect is limited.Therefore,the EM algorithm can't train to get a better GMM-UBM model,which affects the recognition accuracy.In order to overcome the limitations of the EM algorithm itself,this paper proposes to estimate the parameters of GMM-UBM by optimizing the separated Wasserstein distance.Because the optimized space formed by the separated Wasserstein distance contains fewer local extremums,using the stochastic gradient descent method to optimize the sliced Wasserstein distance,it is easier to get a better GMM-UBM model,and thus improve the recognition rate of the speaker recognition of the model.The traditional EM algorithm and the proposed method are compared.The experimental results show that under different initialization methods,different hybrid numbers and different registration data,the recognition rate of the proposed method has different degrees and obvious improvement.On average,the recognition rate of this method can be improved by about 5% compared with the better case of the traditional EM-GMM-UBM model.
Keywords/Search Tags:GMM-UBM, Sliced Wasserstein Distance, Speaker Recogniton, Training Parameter
PDF Full Text Request
Related items