Rapid Speaker Normalization and Adaptation with Applications to Automatic Evaluation of Children's Language Learning Skills

Posted on:2011-11-08

Degree:Ph.D

Type:Dissertation

University:University of California, Los Angeles

Candidate:Wang, Shizhen

Full Text:PDF

GTID:1447390002467462

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

This dissertation investigates speaker variation issues in automatic speech recognition (ASR), with a focus on rapid speaker normalization and adaptation methods using limited enrollment data from the speaker. Investigations are carried out in the direction of reducing spectral variations through frequency warping.;Two methods are developed, one based on the supraglottal (vocal tract) resonances (formants), and the other on resonances from subglottal airways. The first method attempts to reshape (warp) the spectrum by aligning corresponding formant peaks. Since there are various levels of variations in formant structures, regression-tree based phoneme- and state-level spectral peak alignment is studied for rapid speaker adaptation using linearization of the vocal tract length normalization (VTLN) technique. This method is investigated in a maximum likelihood linear regression (MLLR)-like framework, taking advantage of both the efficiency of frequency warping (VTLN) and the reliability of statistical estimations (MLLR). Two different regression classes are investigated: one based on phonetic classes (using combined knowledge and data-driven techniques) and the other based on Gaussian mixture classes.;The second approach utilizes subglottal resonances, which has been shown to affect spectral properties of speech sounds. A reliable algorithm is developed to automatically estimate the second subglottal resonance (Sg2) from speech signals. The algorithm is calibrated on children's speech data with simultaneous accelerometer recordings from which Sg2 frequencies can be directly measured. A cross-language study with bilingual Spanish-English children is performed to investigate whether Sg2 frequencies are independent of speech content and language. The study verifies that Sg2 is approximately constant for a given speaker and thus can be a good candidate for limited data speaker normalization and cross-language adaptation. A speaker normalization method is then presented using Sg2.;As an application, ASR techniques are applied to automatically evaluate children's phonemic awareness through three blending tasks (phoneme blending, onset-rhyme blending and syllable blending). The system incorporates speaker normalization, disfluency detection and Spanish accent detection, together with speech recognition to assess the overall quality of children's speech productions.

Keywords/Search Tags:

Normalization, Speech, Children's, Adaptation

PDF Full Text Request

Related items

1	A controversial disability and its impact on parents: Understanding the adaptation process of parents of children diagnosed with developmental apraxia of speech
2	Research On The Current Situation And Strategies Of Students’ Enrollment Adaptation In The Context Of The Connection Between Primary And Young Children
3	The Study Of Private Speech Phenomenon From Children Aged 2-3
4	Teacher-directed Speech Behavior Research On Young Children In Mixed-age Education Practice
5	The Impact Of Subject Classification Scheme On Field Normalization Effects
6	Study On Features Of Hearing-Speech And Their Relationship For Children With Hearing Impairments And The Training Strategy
7	Study On The Adaptation Of Schools For Migrant Children In Ethnic Minorities
8	Study On The Teaching Of Chinese Speech And Speech In Junior High School
9	The Communicative Acts In 4-5 Years Old Children With Mental Retardation: A Pragmatic Research In Mother-child's Communicative Interactive
10	Research On Normalization Of College Students’ Ideal And Belief Education