Font Size: a A A

Transformation sharing strategies for MLLR speaker adaptation

Posted on:2008-01-08Degree:Ph.DType:Dissertation
University:University of WashingtonCandidate:Mandal, ArindamFull Text:PDF
GTID:1448390005974886Subject:Engineering
Abstract/Summary:
Maximum Likelihood Linear Regression (MLLR) estimates linear transformations of automatic speech recognition (ASR) parameters and has achieved significant performance improvements in speaker-independent ASR systems by adapting to target speakers. Evidence is presented in this dissertation that the performance improvements are not consistent across target speakers, and 15% show degradation in performance levels, i.e. increase in word error rates (WER). Robustness of MLLR adaptation is an important problem and solutions to it are crucial for ASR systems that must adapt to a wide-range of speakers. This dissertation presents new research directions that address this problem, exploring two aspects of MLLR transformation sharing using a regression class tree (RCT): the design of RCTs and the online complexity control of adaptation.; The standard approach for MLLR transformation sharing uses a single speaker-independent RCT. A new approach is proposed that uses multiple RCTs, each trained using speaker-cluster-specific data and represents types of speaker variability, determined by an algorithm that partitions a large corpus of speakers in the eigenspace of their MLLR transformations. ASR experiments show that choosing the appropriate RCT for target speakers leads to significant reduction in WER. For unsupervised adaptation, an algorithm is proposed that linearly combines MLLR transformations from cluster-specific RCTs using weights estimated by maximizing the likelihood of adaptation data and achieves small improvements in WER for several tasks in English and Mandarin. More significantly, distributional analysis shows that it reduces the number of speakers with performance loss from adaptation across ranges of adaptation data and WER.; The standard approach for complexity control in MLLR uses only the amount of adaptation data from a target speaker. Evidence is presented that this does not produce the optimal number of regression classes and significant improvements in WER are achieved using the oracle number of regression classes. A new solution for complexity control is proposed that predicts the number of regression classes in an RCT using speaker-level features with standard statistical classifiers and achieves moderate improvements in WER. Next, a more flexible approach is proposed that performs node-level pruning in an RCT, using node-level features and produces improved robustness of MLLR adaptation.
Keywords/Search Tags:Adaptation, Transformation sharing, ASR systems, RCT using, Regression, Speaker, Performance, Proposed
Related items