Reducing computation in speaker recognition systems using a tree-structured universal background model | | Posted on:2015-07-23 | Degree:Ph.D | Type:Dissertation | | University:New Mexico State University | Candidate:McClanahan, Richard Daniel | Full Text:PDF | | GTID:1478390017993276 | Subject:Electrical engineering | | Abstract/Summary: | PDF Full Text Request | | State-of-the-art speaker recognition systems utilize speaker models that are derived from an adapted universal background model (UBM) in the form of a Gaussian mixture model (GMM). This is true for GMM supervector systems, joint factor analysis systems, and most recently i-vector systems. In all of these systems, the calculation of posterior probabilities and of the sufficient statistics for the weight, mean, and covariance parameters represent a computational bottleneck in both enrollment and testing. In this dissertation, we have developed a method that utilizes a lower resolution GMM hash developed from clusters of GMM-UBM mixture component densities in order to reduce the computational load required. In the adaptation step we score the feature vectors against the hash and calculate the a posteriori probabilities and update the statistics exclusively for mixture components belonging to appropriate clusters.;Each cluster is a grouping of multivariate normal distributions and is modeled by a single multivariate distribution. As such, the set of multivariate normal distributions representing the different clusters also form a GMM. This GMM is referred to as a hash GMM which can be considered a lower resolution representation of the GMM-UBM. The mapping that associates the components of the hash GMM with components of the original GMM-UBM is referred to as a shortlist.;This research investigates various methods of clustering the components of the GMM-UBM and forming hash GMMs. Of five different methods that are presented, one method--Gaussian mixture reduction--outperforms the other methods in terms of reducing computation while preserving recognition accuracy. This method of Gaussian reduction iteratively reduces the size of a GMM by successively merging pairs of component densities using a metric based on the Kullback-Leibler divergence.;Evaluated with a Gaussian mean supervector SVM system and a single layer hash, our research achieves a factor of 2.77 reduction in a posteriori probability calculations with no loss in recognition when using a 250 component GMM-UBM. When clustering was implemented with a 1024 component UBM, we achieved a computation reduction of 5 x with no loss in accuracy and a reduction by a factor of 10x with less than 2.4% relative degradation in EER.;This hash system is extended in this research by employing a tree-structured GMM-UBM which uses Runnalls' Gaussian mixture reduction technique at multiple hierarchical layers, in order to further reduce the number of these probabilistic alignment calculations. With this tree-structured hash, we can reduce this computation by a factor of 14x while incurring less than 5% relative degradation of equal error rate (EER) with a state-of-the-art i-vector system. | | Keywords/Search Tags: | Systems, Recognition, Speaker, GMM, Computation, Tree-structured, Using, Hash | PDF Full Text Request | Related items |
| |
|