Font Size: a A A

Analyzing and improving statistical language models for speech recognitio

Posted on:1995-02-14Degree:Ph.DType:Thesis
University:Simon Fraser University (Canada)Candidate:Ueberla, Joerg PeterFull Text:PDF
GTID:2478390014490288Subject:Computer Science
Abstract/Summary:
A speech recognizer is a device that translates speech into text. Many current speech recognizers contain two components, an acoustic model and a statistical language model. The acoustic model indicates how likely it is that a certain word corresponds to a part of the acoustic signal (e.g. the speech). The statistical language model indicates how likely it is that a certain word will be spoken next, given the words recognized so far. Even though the acoustic model might for example not be able to decide between the acoustically similar words "peach" and "teach", the statistical language model can indicate that the word "peach" is more likely if the previously recognized words are "He ate the".;Current speech recognizers perform well on constrained tasks, but the goal of continuous, speaker independent speech recognition in potentially noisy environments with a very large vocabulary has not been reached so far. How can statistical language models be improved so that more complex tasks can be tackled? This is the question addressed in this thesis.;Since the knowledge of the weaknesses of any theory often makes improving the theory easier, the central idea of this thesis is to analyze the weaknesses of existing statistical language models in order to subsequently improve them. To that end, we formally define a weakness of a statistical language model in terms of the logarithm of the total probability, LTP, a term closely related to the standard perplexity measure used to evaluate statistical language models. This definition is applicable to many probabilistic models, including almost all of the currently used statistical language models.;We apply our definition of a weakness to a frequently used statistical language model, called a bi-pos model. This results, for example, in a new modeling of unknown words which improves the performance of the model by 14% to 21%. Moreover, one of the identified weaknesses has prompted the development of our generalized N-pos language model, which is also outlined in this thesis. It can incorporate linguistic knowledge even if it extends over many words and this is not feasible in a traditional N-pos model. This leads to a discussion of what knowledge should be added to statistical language models in general and we give criteria for selecting potentially useful knowledge. These results show the usefulness of both our definition of a weakness and of performing an analysis of weaknesses of statistical language models in general.
Keywords/Search Tags:Statistical language, Speech, Acoustic, Weaknesses
Related items