Font Size: a A A

Research On Speaker Recognition Based On VPT And GMM

Posted on:2015-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:X Q LuFull Text:PDF
GTID:2268330431950092Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The speech signal conveys many levels of information, such as the concept via words, the language being spoken, the gender and the identity of the speaker. Automatic speaker recognition is a technique that extracts the information in the speech signal conveying speaker identity and makes identification.It belongs to the field of biometric authentication. After decades of development, speaker recognition has widespread application in fields such as Internet access control, telephone banking transaction authentication and judicial security.The commonly-used approaches of speaker recognition can be divided into two categories:one is based on template matching;the other is based on probability and statistic models. Template matching method tries to extract feature vectors from testing speech and calculate its similarity with that from training speech. Template models are simple and easy to calculate, but the recognition accuracy is relatively low. Probabilistic methods use a specific probability density function (pdf) to describe the characteristics of speakers and the log likelihood ratio of feature vectors extracted from test speech with pdf is calculated in the process of recognition. These models are accurate and the recognition rate is high, but they are very complicated so the amount of calculation in training and identification process is very large. With the increase of the target number of speaker recognition system, the time consumed in recognition process increases rapidly, so the recognition speed reduces sharply and cannot meet the real-time need.When it comes to dealing with the shortcomings of these models, this thesis puts forward a two-level speaker recognition model based on VQ-VPT and GMM-UBM, splitting the recognition into two steps. First, a fast search is processed to find out K target speaker voiceprint models that are the most similar to the one to be identified. Then, using the precise GMM-UBM model calculates the likelihood ratio of test feature vectors and makes a final judgment.The fast recognition model is based on VQ-VPT, namely establishing the codebook of all target speakers using LBG algorithm in vector quantization and indexing all code vectors using balanced binary tree VPT. The search time complexity is logarithmic, so it can be used for fast search. GMM-UBM is used as the precise recognition model to guarantee recognition accuracy and a fast scoring approach is used and reduces the amount of calculation furthermore. The two-level model combines the rapidity of template matching method and accuracy of probabilistic method and improves recognition speed with limited performance loss.
Keywords/Search Tags:speaker recognition, Vector Quantization, Vantage Point Tree, GaussianMixture Model, Universal Background Model
PDF Full Text Request
Related items