Multimodal analysis of expressive human communication: Speech and gesture interplay

Posted on:2009-06-02

Degree:Ph.D

Type:Thesis

University:University of Southern California

Candidate:Busso, Carlos

Full Text:PDF

GTID:2448390005456694

Subject:Engineering

Abstract/Summary:

The verbal and non-verbal channels of human communication are internally and intricately connected. As a result, gestures and speech present high levels of correlation and coordination. This relationship is greatly affected by the linguistic and emotional content of the message being communicated. The interplay is observed across the different communication channels such as various aspects of speech, facial expressions, and movements of the hands, head and body. For example, facial expressions and prosodic speech tend to have a stronger emotional modulation when the vocal tract is physically constrained by the articulation to convey other linguistic communicative goals. As a result of the analysis, applications in recognition and synthesis of expressive communication are presented.;From an emotion recognition perspective, we propose to build acoustically neutral models, which are used to measure the degree of similarity between the input speech and neutral speech. A fitness measure is then used as feature for classification, achieving better performance than conventional classification schemes in terms of accuracy and robustness. In addition to detecting users' emotions, we analyze how to use such ideas for meta-analysis of user behavior such as in automatically monitoring and tracking the behaviors, strategies and engagement of the participants in multiperson interactions. We describe a case of study of an intelligent meeting environment equipped with audio-visual sensors. We accurately estimate in real-time not only the flow of the interaction, but also how dominant and engaged each participant was during the discussion.;Finally, we show examples of how to synthesize expressive behavior by exploiting interrelation between speech and gestures. We propose to synthesize natural head motion sequences from acoustic prosodic features by sampling from trained Hidden Markov Models (HMMs). Our comparison experiments show that the synthesized head motions are perceived as natural as the captured head motion sequences.

Keywords/Search Tags:

Speech, Communication, Expressive, Head

Related items

1	The Modeling Research For Speech Emotion Towards Expressive Speech Synthesis
2	Expressive speech-driven facial animation
3	A facial animation model for expressive audio-visual speech
4	The Design And Implementation Of Test System For Expressive Skills Based On Microservices And Containers
5	Research And Implementation Of End-to-End Prosodic Speech Synthesis System
6	Expressive Text-to-speech System On Mandarin
7	Research On Single Channel Speech Enhancement Based On Multi-head Attention Mechanism
8	Research And Application Of Pronunciation Detection For Deaf Children Rehabilitation
9	ThevBow: An expressive musical controller haptic human-computer interface
10	Analyzing And Modeling Voice Quality And Jitter In Emotional Speech Synthesis