Font Size: a A A

Multimodal analysis of expressive human communication: Speech and gesture interplay

Posted on:2009-06-02Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Busso, CarlosFull Text:PDF
GTID:2448390005456694Subject:Engineering
Abstract/Summary:
The verbal and non-verbal channels of human communication are internally and intricately connected. As a result, gestures and speech present high levels of correlation and coordination. This relationship is greatly affected by the linguistic and emotional content of the message being communicated. The interplay is observed across the different communication channels such as various aspects of speech, facial expressions, and movements of the hands, head and body. For example, facial expressions and prosodic speech tend to have a stronger emotional modulation when the vocal tract is physically constrained by the articulation to convey other linguistic communicative goals. As a result of the analysis, applications in recognition and synthesis of expressive communication are presented.;From an emotion recognition perspective, we propose to build acoustically neutral models, which are used to measure the degree of similarity between the input speech and neutral speech. A fitness measure is then used as feature for classification, achieving better performance than conventional classification schemes in terms of accuracy and robustness. In addition to detecting users' emotions, we analyze how to use such ideas for meta-analysis of user behavior such as in automatically monitoring and tracking the behaviors, strategies and engagement of the participants in multiperson interactions. We describe a case of study of an intelligent meeting environment equipped with audio-visual sensors. We accurately estimate in real-time not only the flow of the interaction, but also how dominant and engaged each participant was during the discussion.;Finally, we show examples of how to synthesize expressive behavior by exploiting interrelation between speech and gestures. We propose to synthesize natural head motion sequences from acoustic prosodic features by sampling from trained Hidden Markov Models (HMMs). Our comparison experiments show that the synthesized head motions are perceived as natural as the captured head motion sequences.
Keywords/Search Tags:Speech, Communication, Expressive, Head
Related items