Font Size: a A A

Research Of System And Key Technologies Of Speaker Verification Over Short Utterance In Realistic Environment

Posted on:2017-01-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:W LiFull Text:PDF
GTID:1368330590990823Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
As the rapid progress of technologies,mobile devices are playing more and more important roles in people's daily life.According to a survey in 2012,The market share of mobile device had exceeded the market share of personal computer,which marked the full advent of mobile Internet era.Smart devices,including smart phone,intelligent vehicle terminal and smart home are creating more convenience and comfort for our daily life.Compared with traditional electronic devices,smart devices have more powerful and richer functionalities,faster calculation speech,more storage space,in order to better accommodate the habit of user,mobile devices are prone to storing more private information,how to protect the privacy and safety of personal information has been becoming an essential topic that mobile devices have to cope with.Compared with traditional digits and symbols passwords,speaker verification(or voice print)has more priorities like convenience in sampling,higher safety and confidentiality,which make speaker verification a more and more essential roles in protection of private information.As the continuous perfection of the theory of signal processing and rapid development of the field of pattern recognition,researchers have payed more attention into the stability of the performance of speaker verification system in real world application environment,and how to further improve the verification performance under these environments.Under this background,we aim at constructing a complete speaker verification system,which has flexible modular structure and portability,highly adaptable to various application environment.Base on Gaussian mixture model(GMM),which can well fit read speech data,and support vector machine(SVM),which can well discriminate non-linear data distribution,we carried out deep studies to the core module of speaker verification,like front-end pre-processing,feature extraction,model training and parameter optimization,channel compensation,feature sparsity compensation etc.,our achievements have suppled solid safeguard to the application of speaker verification in multiple application scenarios,like information encryption,identification certification,Security protection etc..The core contribution and innovation of this dissertation are listed as follow:(1)Inspired by Gaussian Mixture Model-Support Vector Machine(GMM-SVM)modeling algorithm applied in text-independent speaker verification,we propose an improved Hidden Markov Model-Support Vector Machine(HMM-SVM)modeling algorithm applied in textdependent speaker verification.The finite state transition property of HMM is applied to represent temporal information in an utterance,the GMM distribution of each markov state can modeling both speaker identity and lexical information.In the backend SVM classifier,we use speaker-and text-dependent HMM supervector in place of speaker dependent but text independent GMM supervector as training and testing sample.Good performance is obtained based on RSR 2015 text dependent speaker verification dataset.(2)we carry out deep exploration to factor analysis based modeling algorithm applied in text independent speaker verification,including Joint Factor Analysis(JFA),Total Factor Matrix based identity vector(i-vector)and Probabilistic Linear Discriminant Analysis modeling(i-vector + PLDA).Large quantity of experiments and theoretical analysis are performed to compare the advantages and shortcomings of these modeling algorithms in different scenarios.We optimize the parameters of these modeling algorithm based on NIST SRE 2005,2006 and2008 training dataset,good baseline performance is obtained in NIST SRE 2008 evaluation dataset.(3)We propose an i-vector based Component Reduction Analysis(CRA)algorithm,which improve the performance of i-vector system when short training or testing utterances are involved.As GMM and Factor Analysis(FA)are two typical examples of statistical models with hidden variables.short-time speech features like Mel Frequency Cepstral Coefficients(MFCC)or Perceptual Linear Predictive cepstrum coefficients(PLP)can not directly join in the estimation of model parameters,a Baum-Welch statistic has to extracted from speech features to estimate model parameters and hidden variables.According to the conclusions of our designated experiments,because of the diversity of culture,emotion or other speaker dependent factors,speech features do not show a balanced distribution in the GMM speaker space,from the view of Baum-Welch statistics,real world speaker features lead to an imbalanced zero order Baum-Welch statistics distribution,in the short utterance condition,extremely low zero order Baum-Welch statistics will lead first order Baum-Welch statistics to a biased estimation,which brings negative effects to i-vector extraction.According to the analysis of extracted BaumWelch statistics based on background model,those Gaussian component with very low zero order Baum-Welch statistics will be abandoned,the rest authentic Gaussian components with reliable Baum-Welch statistics will be retained and join in the estimation of i-vector.CRA can stabilize the i-vector estimation which leads to a more robust performance of speaker verification system.(4)We propose a self-adaptive first order Baum-Welch statistics analysis algorithm,which increases the performance of i-vector based system on short utterance,meanwhile this algorithm only compensate the feature sparsity caused by limited utterance duration,AFSA is independent of the channel compensation modules.Derived from the experimental analysis of section(3),we make further analysis to the biased estimation problem of first order Baum-Welch statistics caused by feature sparsity.We find that the value of zero order Baum-Welch statistics has a positive correlation to the stability of first order Baum-Welch statistics.When we extract statistics from a training utterance,feature sparsity may cause many Gaussian components obtain only very low zero order Baum-Welch statistics,corresponding first order Baum-Welch statistics may show severe deviation,hence we propose an algorithm to optimize the Baum-Welch statistics,we adopt a Bayesian adaptation method,a sufficient normalized Baum-Welch statistics space is constructed,final Baum-Welch statistics used to estimate i-vector is an oprimized interpolation between a referential normalized first order Baum-Welch statistics and extracted normalized first order Baum-Welch statistics,AFSA retains those reliable statistics and compensate those biased statistics caused by feature sparsity.Experimental results show that AFSA can increase up to 20% Equal Error Rate(EER)on short utterance conditions.(5)We propose a binary coding score normalization algorithm for text dependent speaker verification.In the field of text dependent speaker verification,An identity confirmation includes the match of both speaker identity and lexical content,however factor analysis based i-vector,JFA and PLDA modeling are all based on a low dimensional hidden variable vector estimation,the principal low dimension space may make speaker model converge quickly.Although FA based framework shows superior performance in text independent speaker verification field,it may lead to information loss to some extent,especially in confirming lexical information.Aiming at compensating the shortages of FA framework,we propose a binary coding score normalization algorithm.A binary coding vector is extracted from zero order Baum-Welch statistics of an training utterance,if lexical contents of two utterances are obviously different,even these two utterances come from one speaker,corresponding binary coding vectors are different,the inner product of two binary coding vectors is regarded as a tuning factor to the SVM kernel function in the scoring phase,which fuses the lexical diversity in the verification phase.No extra speech recognition engine is needed,a single speaker verification system may finish the confirmation of both speaker identity and lexical content.(6)We have developed a security software based on speaker verification.We transplant a complete GMM-UBM verification engine into mobile device,a balance has to be reached between accuracy and calculation speech.On the other hand,we also analyze the performance of i-vector based on-line verification system.By adopting G.729 coding to compress the user utterance,we can transform data very quickly and save much traffic of mobile device.Until the closing date of this dissertation,this voice lock app can be downloaded from the main android markets and receives good response from users.
Keywords/Search Tags:HMM-SVM, Factor Analysis, i-vector, PLDA, Feature Sparsity Compensation, Local Gaussian Mixture Analysis
PDF Full Text Request
Related items