Font Size: a A A

Statistical Multimodal Signal Inference And Its Applications On Audio-visual Speech Processing

Posted on:2012-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:W HanFull Text:PDF
GTID:2218330362959276Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Manysignalshavethepropertyof'co-occurence', i.e., onetypeofsignalisalwaysaccompaniedbyanothertype, oftenbecausetheyaregeneratedfromasameunderlyingprocedure. Inthiswork, theproblemofinferenceamongmultimodalsignalsisstudied.Speci?cally, a statistical approach based on Gaussian mixture model (GMM) has beenused for this inference task. The GMM based approach consists of two parts: modeltrainingandinference. Fortheinferencepart, threedi?erentmethods: directinference,inference with dynamic feature, and sliding window-based inference are presented andcompared by quality of inference and real-time performance. For model training, be-sides traditional generative training such as Maximum Likelihood, two discriminativetraining criterion are derived for the inference with dynamic feature and the inferencewith sliding window, respectively. The methods for training and for inference havebeen applied to audio-visual conversion, and been tested on LIPS2008/2009 open dataset. The experimental results demonstrate the e?ectiveness of discriminative trainingfor GMM based approach.
Keywords/Search Tags:multimodal signal, gaussian mixture model, timeseries inference, discriminative training, audio-visual signal
PDF Full Text Request
Related items