Semi-supervised model selection with applications to speech recognition

Posted on:2010-05-27

Degree:Ph.D

Type:Thesis

University:The Johns Hopkins University

Candidate:White, Christopher M

Full Text:PDF

GTID:2448390002477679

Subject:Engineering

Abstract/Summary:

Conventional methods for model selection in statistical pattern recognition systems typically maximize empirical performance on a labeled development set whose statistics are expected to be representative of the test data. Such labeled data are often very expensive to obtain and at other times cannot be obtained by system designers before the system is deployed.;We investigate methods to use unlabeled development data to automatically select between alternative trained systems through two semi-supervised procedures: likelihood-ratio-based model selection and disagreement-based model selection.;In the likelihood-ratio framework, each alternative system's automatic labeling of the development data results in a large, mixed set of "good" and "contaminated" observations. We take the view that the system that assigns higher likelihood to (correctly) labeled development data has a model that is closer to the true data-generating distribution. But since it is not known which of the automatic labels are contaminated, we compare the systems using a censored likelihood-ratio. This comparison is inspired by results in nonparametric hypothesis testing for the limiting case of highly contaminated observations. We empirically validate our method in a large-scale state-of-the-art automatic speech recognition (ASR) system by selecting between alternative/candidate pronunciations using only un-transcribed large volumes of speech that potentially contain instances of these words, i.e. without requiring any labeled instances.;In the disagreement-based framework, we use an imperfect yet readily available automatic system---independent of the alternative systems being considered for selection---to generate labels for the development data. We demonstrate that despite errors in the labeling, one may still perform model selection based on the empirical error rate (disagreement) with respect to this sloppy reference. Theoretical results are derived to establish that if the errors in the sloppy reference are uncorrelated with a pair of system alternatives being compared, then successful selection is feasible. It is shown that the probability of error in system selection reduces exponentially in the number of sloppily labeled samples, though the exponent is smaller than in the case of correctly labeled samples by a factor that depends on the error rate of the sloppy labeler. We validate this method by demonstrating successful adjustment of the language model scale factor of an ASR system using only un-transcribed speech; the method results in selection of a nearly optimal scale factor even when the ASR system used as the automatic labeler has a fairly high error rate (c.a. 37%).;The methods presented here address a problem of growing importance in ASR, namely, enabling systems to automatically adjust to varying test conditions using only unlabeled speech collected under such conditions. We expect that they will also find applications in other areas of pattern recognition.

Keywords/Search Tags:

Model selection, Recognition, Speech, Labeled, System, Development, ASR

Related items

1	Application And Research On Speech Recognition Technologies In Security Monitoring System
2	Research On Speech Recognition
3	Microcontroller-based Speech Recognition System Software Design And Development
4	Model selection based speaker adaptation and its application to nonnative speech recognition
5	Research On Acoustic Model Compress For Speech Recognition
6	Research And Implementation Of Speech Emotion Recognition Based On Feature Selection And Confusion
7	Research Of The Speech Recognition Technology Based On HMM
8	Classification And Recognition Of Image Based On Local Features And Weakly Labeled Data
9	Application Research Of Support Vector Machines Sample Pre-selection In Speech Recognition
10	Research On Key Techniques Of Speech Emotion Recognition