Font Size: a A A

Text-Independent Speaker Verification Based On SVM And Statistical Feature

Posted on:2012-10-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Q XuFull Text:PDF
GTID:1118330335462494Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Speaker verification is an important field in speech recognition in which the objective is to determine whether or not an utterance is spoken by a specific claimed speaker. Text-independent speaker verification, as a user friendly application for identity authentication, human computer interaction and etc, is receiving growing research attention in recent years because of its promising role in the coming information society.Speaker verification is a binary classification task, while support vector machine(SVM) is a discriminative classifier that has been found to perform well on a wide range of classification missions. There has been considerable interest in applying SVM to speaker verification. However, SVM is a small sample machine learning algorithm. In contrast, the length of speech utterance in text-independent speaker verification is dynamic, and it's usually parameterized as a set of short-term cepstral features with large size. As for short-term cepstral features, the feature set contains not only the characteristics of speaker, but also a lot of chaos, such as phoneme information, while a single feature makes no sense for speaker verification if used as a support vector in SVM. So, as an algorithm for text-independent speaker verification, SVM has much difficulty in handling such a set of training features.This thesis presents several statistics based feature extraction methods, which transform the short-term cepstral features, which are in large scale, chaotic and lack of speaker information, into statistical features, which are in small scale, high-dimensional and with rich speaker information. Such statistical features will be well suited for SVM and enhance the performance of text-independent speaker verification.Firstly, this thesis proposes a MDK method, which extracts statistical features from the speaker's probability distribution of short-term cepstral features by using Taylor's theorem. Taylor's theorem claims that a function within a region can be reconstructed by its derivatives at that point. For speaker verification task, Taylor's theorem can be used to determine whether or not the two utterances are from the same speaker, by comparing the derivatives of these two utterances' probability distributions. The framework of MDK is as follows: First, short-term features are abstracted from each utterance. And then, Gaussian mixture model(GMM) would be used to simulate the distribution of these features. Third, multiple derivatives are collected from the GMM distribution according Taylor's theorem, and these derivatives are statistical features which referred as MDK features. The proposed MDK features are used as the input of SVM and can give reduction in EER compared with other SVM based systems.Secondly, this thesis presents anther novel moment based statistical feature extraction method, and named as MStat. In order to extract MStat features, a speaker-independent universal template, used as a codebook, should be trained using huge amount utterances from many speakers. And then, the moments of short-term cepstral features of each utterance can be obtained by decomposing the original cepstral features on this universal template, so the speaker characteristics is represented as weight, mean, variance and other higher order moments. Experiments showed that the MStat statistical features performance very well.Different statistics reflects different aspects of speaker characteristics. So, thirdly, this thesis proposes a fusion method to combine the MDK based system and MStat based system. A multi-SVMs system(MM_SVM) based on several statistical features is achieved using empirical linear fusion method. Experimental results demonstrate that MM_SVM has significantly outperformed the GMM-UBM method, and shows relative improvements of up to 42.0% and 28.0% for male and female respectively on NIST dataset.
Keywords/Search Tags:Statistical feature, Taylor's theorem, Moment, text-independent, speaker verification
PDF Full Text Request
Related items