Font Size: a A A

Statistical modeling of heterogeneous features for speech processing tasks

Posted on:2010-08-01Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Ferrer, LucianaFull Text:PDF
GTID:2448390002477461Subject:Statistics
Abstract/Summary:
In this dissertation we describe novel approaches for the improvement of several stages of a sequence classification system. We present results on two tasks: speaker verification, the task of deciding whether a test sample corresponds to a certain target speaker; and nativeness classification, the task of deciding whether the speaker found in a test sample is a native speaker of the language he or she is speaking.;In this dissertation, we present a paradigm for transforming sequential features into fixed-length vectors that combines the advantages of generative and discriminative methods. Generative models are used to define the transform, and discriminative methods are used for classification of the resulting transformed observations. A set of prototype distributions is obtained using vector quantization on a labeled held-out set with a distortion measure that aims to minimize the classification error of the resulting transformation. The transform is obtained as the vector of posterior probabilities of the prototypes.;Prosody, the rhythmic and intonational aspect of speech, can be used to help solve many of the speech processing classification tasks. We apply the proposed transform to prosodic features, which present special challenges compared to the standard spectral features usually extracted from speech signals. The vectors resulting from the above transformation are modeled using support vector machines (SVMs). Results for speaker verification and nativeness classification comparing different approaches for the computation of the prototypes are presented. Results show that the optimal method for the extraction of the prototypes is highly dependent on the amount of data present in each sample, the number of samples used to train the SVMs and, possibly, the type of prosodic features being extracted.;Another contribution of the thesis is a general method for modeling prior information within the SVM framework. SVMs can be interpreted as a maximum a posteriori estimation of a model's parameters. In the standard formulation of SVM classification and regression, the prior distribution on the weight vector is implicitly assumed to be a multidimensional Gaussian with zero mean and identity covariance matrix. We relax the assumption that the covariance matrix is the identity matrix, allowing it to be a more general block diagonal matrix. In speaker verification this matrix can be estimated from a set of held-out speakers. We show relative improvements of 10% on the equal error rate of two speaker verification systems when using this method compared to the standard SVM approach.;Prosodic information may be just one of the information sources used to solve a certain speech classification problem. In general, many systems can be trained separately to perform the same classification task using different features or modeling techniques. The output of these individual systems can then be combined to obtain the final score which is then used to make the final decision. In this framework, individual systems are trained independently and their outputs combined by a simple function. In this dissertation, a method for training the individual systems to improve the performance of the final combined score is presented. The SVM objective function is modified to include a term that penalizes large values of a correlation coefficient between the system being trained and a pre-existing system with which the new system will be later combined. The new optimization problem can be transformed into a standard SVM problem with a new kernel that we call the anticorrelation kernel. A 20% relative gain is achieved on a combination of four speaker verification systems by using the proposed method when training the individual systems. (Abstract shortened by UMI.)...
Keywords/Search Tags:Speaker verification, Classification, Individual systems, Features, Speech, SVM, Method, Using
Related items