Font Size: a A A

Compact representations and unsupervised training of discriminative language models

Posted on:2014-12-07Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Xu, PuyangFull Text:PDF
GTID:1458390005488643Subject:Computer Science
Abstract/Summary:
Statistical language models are a crucial component of automatic speech recognition (ASR) systems: they assign a priori probability to candidate word sequences under consideration by the system. Conventionally, an LM is trained from a text corpus using standard statistical criteria such as maximum likelihood (ML). Discriminative training of an LM, by contrast, entails using an initial ASR system to identify a set of competing candidate transcriptions for each utterance in a speech corpus, and adjusting the LM parameters to favor the correct transcriptions over incorrect candidates. A discriminatively-trained language model (DLM) is demonstrably complementary to an ML-trained model in improving ASR accuracy.;Two important obstacles to the widespread use of DLMs are addressed in this dissertation: having to store a much larger number of parameters than a typical ML-trained model, and requiring transcribed speech to estimate model parameters.;DLMs tend to have a much larger number of parameters than ML-trained LMs, mainly to capture statistical information from an enormous number of incorrect ASR hypotheses in addition to statistics from the correct transcriptions. Their memory footprint is therefore often prohibitively large. Three novel techniques are proposed to represent DLMs compactly, namely feature randomization that results in parameter sharing, re-parameterization of the DLM as a convolutional neural network, and phone-level parameterization of the DLM instead of word-level parameterization. All three techniques are able to reduce the size of the model by orders of magnitude, with negligible loss in model performance.;Unsupervised training methods for DLMs are also developed—discriminative training methods that does not require transcribed speech—by observing that the core requirement in discriminative training is a set of incorrect competitors for each (correct) sentence in a text corpus. A novel approach for simulating competitors is proposed that uses phrasal cohorts: alternative, acoustically confusable phrases that the ASR system is likely to consider for any phrase in the original sentence. Competing candidate transcriptions may be generated by this approach from text alone, without requiring transcribed speech. The efficacy of this approach is investigated on a range of state-of-the-art ASR systems. It is demonstrated empirically that depending on the underlying ASR system, unsupervised discriminative training using simulated confusions achieves between 15% and 60% of the improvement obtained by supervised discriminative training of language models.
Keywords/Search Tags:Model, Language, Training, Discriminative, ASR, Unsupervised, Speech
Related items