Font Size: a A A

Most likely state sequence speech reconstruction using a generalized hidden semi Markov model with two distinct regeneration times applied to English

Posted on:2005-02-27Degree:Ph.DType:Dissertation
University:Rensselaer Polytechnic InstituteCandidate:Moore, Michael DFull Text:PDF
GTID:1458390011451293Subject:Engineering
Abstract/Summary:
Most likely state sequence reconstruction is a new application for stochastic processes such as Hidden Markov Models (HMMs) and Hidden Semi-Markov Models (HSMMs). Current commercial speech recognizers (such as ViaVoice, HTK, and recognition engines from Microsoft) typically perform relatively well in controlled environments. When signal to noise ratio is not favorable, mis-recognitions usually occur. Methods for overcoming this problem typically employ enhancement or recovery of the acoustic features of individual frames of damaged speech during recognition.; In this research, damaged frames are assigned an unknown state symbol during recognition. The damaged state sequence contained in this partially recognized stream of frames is then statistically reconstructed using Markov models. Such a stream of frames is available from most recognizers when performing recognition and segmentation of the incoming speech during semi-continuous speech recognition (semi-CSR). Using this stream of frames and the statistics of the language, replacements for damaged states are selected from a database, inserted into the state sequence, and replayed to perform reconstruction.; HMMs have a regeneration time (memory) on the order of a single frame and thus are relatively unsuited for reconstruction of states or state sequences containing more than a few contiguous damaged frames. HSMMs have a regeneration time that is on the order of a single state, and thus are capable of reconstructing multiple damaged frames within a state. Ergodic (fully connected) HSMMs not trained for any specific state sequence have difficulty reconstructing contiguous damaged states.; This research describes a Generalized HSMM (GHSMM) reconstructs unknown frames, states, and state sequences in a stream of partially recognized frames produced by semi-CSR. GHSMMs are applicable to any state sequence that can be described in a Markovian fashion and which possesses a state sequence similarity measure. This similarity measure motivates a non-stationary transition matrix that adds the regeneration time of a state sequence to the HSMM. Because of this, GHSMMs can reconstruct the output of any recognizer, can correct both intermittent noise damage and incorrect recognition caused by recognizer imperfections, and in cases achieve state sequence reconstruction rates exceeding 90% correct.
Keywords/Search Tags:State sequence, Reconstruction, Regeneration time, Markov, Hidden, Speech, Recognition, Frames
Related items