Font Size: a A A

A Research On The Vocal Password System

Posted on:2013-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q PanFull Text:PDF
GTID:1228330377451698Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Among various kinds of biometric-based authentication technology, vocal password has attracted more attention by such advantages as its double encryption and being convenient to produce, capture, and transmit over telephone or network. In tranditional vocal password systems, the password content and the speaker-specific voice characteristic need to be confirmed by speech recognition and speaker recognition respectively. It is easy to achieve good performance via the two-pass strategy. However, once the content of the enrolled password has been already obtained by impostors, false accept rate of tranditional vocal password systems will dramatically increase without the help of speech recognizer. This dissertation particularly focuses on vocal password system with known password and limited training utterances, carries out a systematic and in-depth research on this topic, and introduces innovations in feature processing, model training and decision algorithm as follows:Firstly, this dissertation proposes a channel compensation technology named Feature Space Bias Estimation (FSBE). As the influence of channel noise is shown as a linear function in cepstrum domain, the traditional vocal password systems used to apply some methods in feature domain to enhance the channel robustness, such as CMS, CMVN and double Gaussian based CDF-Matching. The problem of these methods results from an assumption that the distribution over test speech should be considered as a single or double Gaussian. The method of FSBE discards that impractical assumption and projects the test speech onto the channel space of each Gaussian of speaker model. It estimates the parameters of corresponding linear functions by maximizing the likelihood between the test speech and the target or imposter speaker models. Experiments show that better performance can be achieved by adding FSBE and its modified strategy in the cross-channel condition.Secondly, this dissertation proposes a novel modeling strategy named Gaussian Mixture Frame Model (GMFM). The traditional modeling approaches are usually divided into two kinds:the non-parametric methods and parameter estimation methods. The speaker model of the first kind cannot describe the potential distribution and intra-speaker variability, but only represent the limited structure of enrolled data. As to the second kind, the main disadvantage is that it is difficult to obtain accurate estimation of parameters in the situation of data sparsity. Although GMM-UBM is known for the smart framework solving that problem, its performance is still negatively affected by the approximation of covariance matrix and the neglect of instantaneous characteristic. To get rid of those disadvantages, the new modeling method GMFM is proposed. In this method, the mean vectors are directly fixed by the corresponding sample vectors, and the covariance matrix is constrained as a global diagonal matrix or several clusters and then estimated on the maximum likelihood criteria. It can be viewed as the hybrid of parameter method and the non-parametric procedure. As a result, GMFM is able to effectively solve the problem of data sparsity, and reflect both the intra-speaker variability and the instantaneous information. Experimental results suggest that the GMFM-based system outperforms baselines.Thirdly, this dissertation introduces the discriminative training to vocal password system in order to improve its robust performance. Being unable to meet the need of large amount of data, the traditional discriminative training has not been successfully applied in vocal password up to now. This dissertation provides an interesting discriminative training framework based on a special pre-processing strategy, by which the original feature is transformed into a new one representing the distance measure between the test speech and its corresponding pattern, and all the training data is consequently divided into two classes. Then the trouble caused by data sparsity can be solved to some extent, and the discriminative training of two-class models is successfully realized on the minimum classification error criteria. As a consequence, the fusion of this discriminative model based system with GMM-UBM system leads to steady improvements in experiment results.Finally, this dissertation proposes a multi-dimension feature classifier in the score domain of vocal password system. As ignoring the different discrimination ability between types of data, the widely-used average likelihood ratio strategy brings harm to the system performance. In order to get a new verification measure to provide appropriate weight to the score of each type, we propose a novel strategy in score domain, which is to classify the frames by UBM, combine the likelihood ratio score of each class to form new multi-dimension feature, and then perform speaker verification by SVM. By use of this new verification measure, the vocal password problem is transformed into a two-class classification problem in the multi-dimension feature space. Experiments show that new scoring method using the multi-dimension classifier of score domain produces excellent error rates and robust performance both in the co-channel and cross-channel situations.
Keywords/Search Tags:Vocal Password, Speaker Verification, Gaussian Mixture Frame Model, Discriminative Training, Multi-dimension Classifier in Score domain, GMM-UBM, Support Vector Machine
PDF Full Text Request
Related items