Manysignalshavethepropertyof'co-occurence', i.e., onetypeofsignalisalwaysaccompaniedbyanothertype, oftenbecausetheyaregeneratedfromasameunderlyingprocedure. Inthiswork, theproblemofinferenceamongmultimodalsignalsisstudied.Speci?cally, a statistical approach based on Gaussian mixture model (GMM) has beenused for this inference task. The GMM based approach consists of two parts: modeltrainingandinference. Fortheinferencepart, threedi?erentmethods: directinference,inference with dynamic feature, and sliding window-based inference are presented andcompared by quality of inference and real-time performance. For model training, be-sides traditional generative training such as Maximum Likelihood, two discriminativetraining criterion are derived for the inference with dynamic feature and the inferencewith sliding window, respectively. The methods for training and for inference havebeen applied to audio-visual conversion, and been tested on LIPS2008/2009 open dataset. The experimental results demonstrate the e?ectiveness of discriminative trainingfor GMM based approach. |