Graphical models for large vocabulary speech recognition

Posted on:2009-11-09

Degree:Ph.D

Type:Thesis

University:University of Washington

Candidate:Bartels, Chris Dennis

Full Text:PDF

GTID:2448390002495268

Subject:Engineering

Abstract/Summary:

This thesis presents triangulation methodology and new graphical models for automatic speech recognition. The improved triangulation techniques presented here can lower the computational costs of exact probabilistic inference in graphical models. This thesis is particularly interested in finding triangulations of graphical models used in speech and language applications. The triangulation procedures developed in the graphical model community do not address two aspects of such graphs. The first aspect is that the graphs have a high degree of determinism. It is shown that in the presence of determinism the optimal triangulation can be completely outside the search space of the most widely adopted triangulation techniques. It is also demonstrated that when determinism is present certain large-clique graph triangulations can outperform triangulations with smaller clique sizes. This is counter to the conventional wisdom that triangulations that minimize clique size are always most desirable. Ancestral pairs are presented as the basis for novel triangulation heuristics, and it is proven that no more than the addition of edges between ancestral pairs need to be considered when searching for state space optimal triangulations. A genetic algorithm for large clique triangulations is also presented. Empirical results are given on random and real world graphs. A number of theoretical results are also presented, including an algorithm for determining if a triangulation can be obtained via the elimination algorithm. The second aspect is that speech graphs are variable length and have a repeating structure. Triangulation techniques are developed that are not limited by the repeating structure as defined by the graph designer.;The second goal of this thesis is to develop novel graphical models for improving recognition performance. A set of models is presented that enhance the standard model with information about syllabic segmentations. This segmentation information comes in the form of syllable nuclei locations. Using estimated locations, the graph gives improved discrimination between speech and noise when compared to a baseline model. Using locations derived from oracle information an overall improvement is given, and when the oracle syllable nuclei information is augmented with information about lexical stress it gives additional improvements over locations alone.

Keywords/Search Tags:

Graphical models, Speech, Triangulation, Information, Presented, Locations

Related items

1	Reasoning and Decisions in Probabilistic Graphical Models -- A Unified Framework
2	Testing Independence in High Dimensions & Identifiability of Graphical Models
3	Probabilistic Graphical Models For Visual Feature Analysis
4	Graphical Models for Heterogeneous Transfer Learning and Co-reference Resolution
5	Speech enhancement based on perceptual loudness and statistical models of speech
6	AND/OR search spaces for graphical models
7	Time-like graphical models
8	New approaches using probabilistic graphical models in health economics and outcomes research
9	Layered graphical models for tracking partially-occluded moving objects in video
10	Ant colony inspired models for trust-based recommendations