Most likely state sequence speech reconstruction using a generalized hidden semi Markov model with two distinct regeneration times applied to English

Posted on:2005-02-27

Degree:Ph.D

Type:Dissertation

University:Rensselaer Polytechnic Institute

Candidate:Moore, Michael D

Full Text:PDF

GTID:1458390011451293

Subject:Engineering

Abstract/Summary:

Most likely state sequence reconstruction is a new application for stochastic processes such as Hidden Markov Models (HMMs) and Hidden Semi-Markov Models (HSMMs). Current commercial speech recognizers (such as ViaVoice, HTK, and recognition engines from Microsoft) typically perform relatively well in controlled environments. When signal to noise ratio is not favorable, mis-recognitions usually occur. Methods for overcoming this problem typically employ enhancement or recovery of the acoustic features of individual frames of damaged speech during recognition.; In this research, damaged frames are assigned an unknown state symbol during recognition. The damaged state sequence contained in this partially recognized stream of frames is then statistically reconstructed using Markov models. Such a stream of frames is available from most recognizers when performing recognition and segmentation of the incoming speech during semi-continuous speech recognition (semi-CSR). Using this stream of frames and the statistics of the language, replacements for damaged states are selected from a database, inserted into the state sequence, and replayed to perform reconstruction.; HMMs have a regeneration time (memory) on the order of a single frame and thus are relatively unsuited for reconstruction of states or state sequences containing more than a few contiguous damaged frames. HSMMs have a regeneration time that is on the order of a single state, and thus are capable of reconstructing multiple damaged frames within a state. Ergodic (fully connected) HSMMs not trained for any specific state sequence have difficulty reconstructing contiguous damaged states.; This research describes a Generalized HSMM (GHSMM) reconstructs unknown frames, states, and state sequences in a stream of partially recognized frames produced by semi-CSR. GHSMMs are applicable to any state sequence that can be described in a Markovian fashion and which possesses a state sequence similarity measure. This similarity measure motivates a non-stationary transition matrix that adds the regeneration time of a state sequence to the HSMM. Because of this, GHSMMs can reconstruct the output of any recognizer, can correct both intermittent noise damage and incorrect recognition caused by recognizer imperfections, and in cases achieve state sequence reconstruction rates exceeding 90% correct.

Keywords/Search Tags:

State sequence, Reconstruction, Regeneration time, Markov, Hidden, Speech, Recognition, Frames

Related items

1	Speech Recognition Method Based On Hidden Markov Models
2	Research Of Speech Recognition Based On Mixture Feature Extraction And Improved Continuous Hidden Markov Model
3	Research On Speech Recognition
4	Distributed Speech Recognition And Voice XML Standardlanguage In Vivid-Ring Application
5	Research On Isolated Word Speech Recognition Algorithm And System Simulation
6	Speech Recognition Algorithm Simulation And Software Design Based On DTW And HMM
7	A Real-Time Speech Recognition System Based On The Implementation Of FPGA
8	Study On Speaker-Independent Isolated Words Speech Recognition System
9	Small Vocabulary Chinese Isolated Word Speech Recognition Theory And Technology Research
10	The Research Of Small Vocabulary Speaker-Independent Isolated Word Speech Recognition System