Font Size: a A A

Rapid Speaker Normalization and Adaptation with Applications to Automatic Evaluation of Children's Language Learning Skills

Posted on:2011-11-08Degree:Ph.DType:Dissertation
University:University of California, Los AngelesCandidate:Wang, ShizhenFull Text:PDF
GTID:1447390002467462Subject:Engineering
Abstract/Summary:PDF Full Text Request
This dissertation investigates speaker variation issues in automatic speech recognition (ASR), with a focus on rapid speaker normalization and adaptation methods using limited enrollment data from the speaker. Investigations are carried out in the direction of reducing spectral variations through frequency warping.;Two methods are developed, one based on the supraglottal (vocal tract) resonances (formants), and the other on resonances from subglottal airways. The first method attempts to reshape (warp) the spectrum by aligning corresponding formant peaks. Since there are various levels of variations in formant structures, regression-tree based phoneme- and state-level spectral peak alignment is studied for rapid speaker adaptation using linearization of the vocal tract length normalization (VTLN) technique. This method is investigated in a maximum likelihood linear regression (MLLR)-like framework, taking advantage of both the efficiency of frequency warping (VTLN) and the reliability of statistical estimations (MLLR). Two different regression classes are investigated: one based on phonetic classes (using combined knowledge and data-driven techniques) and the other based on Gaussian mixture classes.;The second approach utilizes subglottal resonances, which has been shown to affect spectral properties of speech sounds. A reliable algorithm is developed to automatically estimate the second subglottal resonance (Sg2) from speech signals. The algorithm is calibrated on children's speech data with simultaneous accelerometer recordings from which Sg2 frequencies can be directly measured. A cross-language study with bilingual Spanish-English children is performed to investigate whether Sg2 frequencies are independent of speech content and language. The study verifies that Sg2 is approximately constant for a given speaker and thus can be a good candidate for limited data speaker normalization and cross-language adaptation. A speaker normalization method is then presented using Sg2.;As an application, ASR techniques are applied to automatically evaluate children's phonemic awareness through three blending tasks (phoneme blending, onset-rhyme blending and syllable blending). The system incorporates speaker normalization, disfluency detection and Spanish accent detection, together with speech recognition to assess the overall quality of children's speech productions.
Keywords/Search Tags:Normalization, Speech, Children's, Adaptation
PDF Full Text Request
Related items