Font Size: a A A

Efficient speaker recognition using speaker model clusters

Posted on:2010-08-14Degree:Ph.DType:Dissertation
University:New Mexico State UniversityCandidate:Apsingekar, Vijendra RajFull Text:PDF
GTID:1448390002471625Subject:Applied Mathematics
Abstract/Summary:
Speaker recognition (SR) can be broadly classified into speaker identification (SI) and speaker verification. The objective of speaker identification is to determine which voice sample from a set of known voice samples best matches the characteristics of an unknown input voice sample. The objective in speaker verification is to accept or reject a claim of identity based on a voice sample.In large population SI systems, likelihood computations between an unknown speaker's feature vectors and the speaker models can be very time-consuming and impose a bottleneck. For applications requiring fast SI, this is a recognized problem and improvements in efficiency would be beneficial. In this dissertation, various methods for speaker model clusters (SMC) are proposed. After clustering, only a small proportion of the speaker models in the selected clusters be used in the likelihood computations resulting in a significant speed-up with little to no loss in accuracy. In general, as the number of selected clusters is reduced, the identification accuracy decreases, however, this loss can be controlled through proper trade-off.Experiments are conducted on TIMIT, NTIMIT and NIST-2002 speech corpora. These corpora are widely used large speaker population corpora under various background channel conditions. Using the proposed SMCs along with other speed-up techniques, a speed-up factor of 150x with little loss in accuracy and a speed-up factor of 75x with no loss in accuracy could be achieved on all the three corpora.SMCs may also be used to improve the accuracy of SV systems by improving score normalization. Among the various proposed score normalizations, T- and Z-norm are the most widely used in SV systems. These normalizations require selection of a set of cohort models or utterances in order to estimate the impostor score distribution. In this dissertation, selection of cohorts using SMCs is also proposed and investigated. The proposed SMC based T- and Z-normalization is evaluated against conventional T- and Z-normalizations. In addition, three new normalization techniques, Delta-, DeltaT- and TC-norm are proposed, which also use SMCs to estimate the normalization parameters.Results show that both the equal error rate (EER) and minimum decision cost function (DCF) can be lowered using SMC-based score normalization techniques. With no normalization applied on NIST-2002 corpus an EER of 12.25% is reduced to as low as 7.0% with the SMC based score normalization techniques.Open-set speaker identification (OSI) experiments are also performed using SMCs and EER of 20.0% using conventional methods is reduced to 8.4% using SMCs.
Keywords/Search Tags:Speaker, Using, EER, SMC, Clusters
Related items