Font Size: a A A

Graphical models for large vocabulary speech recognition

Posted on:2009-11-09Degree:Ph.DType:Thesis
University:University of WashingtonCandidate:Bartels, Chris DennisFull Text:PDF
GTID:2448390002495268Subject:Engineering
Abstract/Summary:
This thesis presents triangulation methodology and new graphical models for automatic speech recognition. The improved triangulation techniques presented here can lower the computational costs of exact probabilistic inference in graphical models. This thesis is particularly interested in finding triangulations of graphical models used in speech and language applications. The triangulation procedures developed in the graphical model community do not address two aspects of such graphs. The first aspect is that the graphs have a high degree of determinism. It is shown that in the presence of determinism the optimal triangulation can be completely outside the search space of the most widely adopted triangulation techniques. It is also demonstrated that when determinism is present certain large-clique graph triangulations can outperform triangulations with smaller clique sizes. This is counter to the conventional wisdom that triangulations that minimize clique size are always most desirable. Ancestral pairs are presented as the basis for novel triangulation heuristics, and it is proven that no more than the addition of edges between ancestral pairs need to be considered when searching for state space optimal triangulations. A genetic algorithm for large clique triangulations is also presented. Empirical results are given on random and real world graphs. A number of theoretical results are also presented, including an algorithm for determining if a triangulation can be obtained via the elimination algorithm. The second aspect is that speech graphs are variable length and have a repeating structure. Triangulation techniques are developed that are not limited by the repeating structure as defined by the graph designer.;The second goal of this thesis is to develop novel graphical models for improving recognition performance. A set of models is presented that enhance the standard model with information about syllabic segmentations. This segmentation information comes in the form of syllable nuclei locations. Using estimated locations, the graph gives improved discrimination between speech and noise when compared to a baseline model. Using locations derived from oracle information an overall improvement is given, and when the oracle syllable nuclei information is augmented with information about lexical stress it gives additional improvements over locations alone.
Keywords/Search Tags:Graphical models, Speech, Triangulation, Information, Presented, Locations
Related items