Font Size: a A A

Feature and model transformation techniques for robust speaker verification

Posted on:2006-06-10Degree:Ph.DType:Dissertation
University:Hong Kong Polytechnic University (People's Republic of China)Candidate:Yiu, Kwok KwongFull Text:PDF
GTID:1458390008967506Subject:Engineering
Abstract/Summary:
Speaker verification is to verify the identity of a speaker based on his or her own voice. It has potential applications in securing remote access services such as phonebanking and mobile-commerce. This dissertation addresses the robustness issue of speaker verification systems in three different angles: speaker modeling, feature transformation, and model transformation.; This dissertation begins with an investigation on the effectiveness of three kernel-based neural networks for speaker modeling. These networks include probabilistic decision-based neural networks (PDBNNs), Gaussian mixture models (GMMs), and elliptical basis function networks (EBFNs).; The effect of handset variation can be suppressed by transforming clean speech models to fit the handset-distorted speech. To this end, this dissertation proposes a model-based transformation technique that combines handset-dependent model transformation and reinforced learning. Specifically, the approach transforms the clean speaker model and clean background model to fit the distorted speech by using maximum-likelihood linear regression (MLLR), which is followed by adapting the transformed models via PDBNN's reinforced learning.; In addition to model-based approaches, handset variation can also be suppressed by feature-based approaches. Current feature-based approaches typically identify the handset being used as one of the known handsets in a handset database and use the a priori knowledge about the identified handset to modify the features. However, it will be much more practical and cost effective if handset detector-free systems are adopted. To this end, this dissertation proposes a blind compensation algorithm to handle the situation in which no a priori knowledge about the handset is available (i.e., a handset model which is not in the handset database is used). Specifically, a composite statistical model formed by the fusion of a speaker model and a background model is used to represent the characteristics of enrollment speech. Based on the difference between the claimant's speech and the composite model, a stochastic matching type of approach is proposed to transform the claimant's speech to a region close to the enrollment speech. Therefore, the algorithm can now estimate the transformation online without the necessity of detecting the handset types. (Abstract shortened by UMI.)...
Keywords/Search Tags:Speaker, Transformation, Model, Handset
Related items