Research On Dynamic Bayesian Network Models For Audio-Visual Specch Recognition

Posted on:2008-09-16

Degree:Master

Type:Thesis

Country:China

Candidate:A L Sun

Full Text:PDF

GTID:2178360212978890

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Dynamic Bayesian Network (DBN), because of extensibility, powerful description, inference and learning abilities for the time series, being used in the speech recognition. In this paper, the author designs a single stream DBN model for audio or video speech recognition and phoneme (or veseme) segmentation. The works of this paper is outlined as the following:First, the author investigates the Continuous Speech Recognition System based on Hidden Markov Model (HMM), including the processing of embedded training and recognition. The connecting digital audio and video database has been recoded. For audio stream, Mel Filterbank Cepstrum Coefficients (MFCC) features be extracted, for video stream, three kinds of lip features be extracted: 1) static geometric features, 2) static and delta dynamic geometric features, 3) linear interpolation geometric features based on static and dynamic features. Audio experiment results show that tri-phone HMM has higher word recognition rates than monophone HMM. Video experiment results show that the third lip features has higher word recognition rates than the others.Second, studying the basic principle of DBN, topology, probabilistic inference formula, Tree Inference, Frontier Inference and Junction Tree Algorithm. Results show that DBN is more universal, explicit and extensible than HMM.Third, studying and improving the Word-State DBN (WS-DBN) model, design the acoustic speech model based on Word-Phone DBN (WP-DBN) model, the visual speech model based on Word-Viseme DBN (WV-DBN) model, implement the system of WS-DBN and WV-DBN with Graphical Model Toolkit (GMTK). The WP-DBN and WV-DBN models emulate the structure of word-phone (or word-viseme), show the transition probabilities between phones (or visemes) and the character of the output the phone (or viseme) segmentation with timing boundaries.Finally, the author defines evaluation criteria of word recognition rates, word recognition accuracies and phone (or viseme) segementation score. Compare the recognition and segmentation performances of the WS-DBN model, WP-DBN model, WV-DBN model, monophone HMM, tri-phone HMM and monoviseme HMM in different noisy environments. Audio experimental results show that WP-DBN model: 1) almost has the same recognition rates compare to the tri-phone HMM for clean speech; 2) are more robust to noisy environments compare to the HMM. Video...

Keywords/Search Tags:

Dynamic Bayesian Network (DBN), Graphical Model Toolkit (GMTK), Word-Phone DBN (WP-DBN), Word-Viseme DBN (WV-DBN)

PDF Full Text Request

Related items

1	Word segmentation, word recognition, and word learning: A computational model of first language acquisition
2	Intelligent Optimization Based-on Graphical Models
3	Topic Model For Short Texts Based On Word Triangles
4	Research And Implementation Of Chinese Word Segmentation Algorithm
5	Word Sense Disambiguation Technology Research Based On Hownet And Bayesian Model
6	Context Computing Applications, Word Disambiguation
7	Research On Chinese Word Segmentation Method Based On Word Embedding
8	Chinese Word Meaning Elimination Qi
9	Research On Short Text Topic Model Based On Semantic Information And Word Triangle
10	Multi-prototype Word Vector Based On Context Word Embedding