Cross-Modality Semantic Integration and Robust Interpretation of Multimodal User Interactions

Posted on:2011-08-01

Degree:Ph.D

Type:Thesis

University:The Chinese University of Hong Kong (Hong Kong)

Candidate:Hui, Pui Yu

Full Text:PDF

GTID:2448390002969629

Subject:Engineering

Abstract/Summary:

Multimodal systems can represent and manipulate semantics from different human communication modalities at different levels of abstraction, in which multimodal integration is required to integrate the semantics from two or more modalities and generate an interpretable output for further processing. In this work, we develop a framework pertaining to automatic cross-modality semantic integration of multimodal user interactions using speech and pen gestures. It begins by generating partial interpretations for each input event as a ranked list of hypothesized semantics. We devise a cross-modality semantic integration procedure to align the pair of hypothesis lists between every speech input event and every pen input event in a multimodal expression. This is achieved by the Viterbi alignment that enforces the temporal ordering and semantic compatibility constraints of aligned events. The alignment enables generation of a unimodal paraphrase that is semantically equivalent to the original multimodal expression. Our experiments are based on a multimodal corpus in the navigation domain. Application of the integration procedure to manual transcripts shows that correct unimodal paraphrases are generated for around 96% of the multimodal inquiries in the test set. However, if we replace this with automatic speech and pen recognition transcripts, the performance drops to around 53% of the test set. In order to address this issue, we devised the hypothesis rescoring procedure that evaluates all candidates of cross-modality integration derived from multiple recognition hypotheses from each modality. The rescoring function incorporates the integration score, N-best purity of recognized spoken locative references (SLRs), as well as distances between coordinates of recognized pen gestures and their interpreted icons on the map. Application of cross-modality hypothesis rescoring improved the performance to generate correct unimodal paraphrases for over 72% of the multimodal inquiries of the test set.;We have also performed a latent semantic modeling (LSM) for interpreting multimodal user input consisting of speech and pen gestures. Each modality of a multimodal input carries semantics related to a domain-specific task goal (TG). Each input is annotated manually with a TG based on the semantics. Multimodal input usually has a simpler syntactic structure and different order of semantic constituents from unimodal input. Therefore, we proposed to use LSM to derive the latent semantics from the multimodal inputs. In order to achieve this, we characterized the cross-modal integration pattern as 3-tuple multimodal terms taking into account SLR, pen gesture type and their temporal relation. The correlation term matrix is then decomposed using singular value decomposition (SVD) to derive the latent semantics automatically. TG inference on disjoint test set based on the latent semantics achieves accurate performance for 99% of the multimodal inquiries.

Keywords/Search Tags:

Multimodal, Semantic, Test set, Input

Related items

1	Research On Venue Semantic Modeling Algorithm Based On Multimodal Data
2	Research On Personalized Affective Interaction Based On Multimodal Semantic Analysis
3	Narrowing down the semantic gap between content and context using multimodal keywords
4	Speech input in multimodal environments: Effects of perceptual structure on speed, accuracy, and acceptance
5	Research On Key Technologies Of Semantic Retrieval Based On Multimodal Data
6	Multimodal Data Analysis And Applications Based On Multimodal Resonance And Co-Occurrence
7	Research On Machine Multimodal Perception
8	Video Semantic Analysis Based On Multimodal Features
9	Research On Affective Computing Based On Multimodal Fusion
10	Study On Multimodal-based Retrieval Of Mammography