Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction

Posted on:2001-02-22

Degree:Ph.D

Type:Thesis

University:University of Illinois at Urbana-Champaign

Candidate:Chen, Lawrence Shao-Hsien

Full Text:PDF

GTID:2468390014458453

Subject:Engineering

Abstract/Summary:

Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities to control the computer such as voice, gesture, and force-feedback are emerging. Among these, voice and vision are two natural modalities in human-to-human communication. Automatic speech recognition (ASR) technology has matured enough to allow users to dictate to a word processor or operate the computer using voice commands. Computer vision techniques have enabled the computer to see. Interacting with computers in these modalities is much more natural for people, and the progression is towards the kind of interaction between humans. Despite these advances, one necessary ingredient for natural interaction is still missing—emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in some applications such as computer-aided learning or user-friendly online help.; This thesis addresses the problem of detecting human emotional expressions by computer from the voice and facial motions of the user. The computer is equipped with a microphone to listen to the user's voice, and a video camera to look at the user. Prosodic features in the audio and facial motions exhibited on the face can help the computer make some inferences about the user's emotional state, assuming the users are willing to show their emotions. Another problem it addresses is the coupling between voice and the facial expression. Sometimes the user moves the lips to produce the speech, and sometimes the user only exhibits facial expression without speaking any words. Therefore, it is important to handle these two modalities accordingly. In particular, a pure “facial expression detector” will not function properly when the person is speaking, and a pure “vocal emotion recognizer” is useless when the user is not speaking. In this thesis, a complementary relationship between audio and video is proposed. Although these two modalities do not couple strongly in time, they seem to complement each other. In some cases, similar facial expressions may have different vocal characteristics, and vocal emotions having similar properties may have distinct facial behaviors.

Keywords/Search Tags:

Computer, Expression, Human, Emotions, User, Interaction, Emotional

Related items

1	Research On Natural Human-computer Interaction Based On Facial Expression Recognition In Web3D
2	Research On Human-Computer Interaction For Children
3	Study On Method And Technology Of Human-Computer Emotion Interaction
4	Based On Galvanic Skin Response Signal Emotional Real-time Identification And Adjustment Method Research In Human-computer Interaction Environment
5	Research On Human-Computer Interactive Emotional Anthropomorphic Strategies
6	Research On Human-Computer Interaction Design Of Intelligent Mobile Phone Based On Emotional
7	Kinetic Typography In Internet And Its Intelligent Expression Of Human Emotions
8	Research Of Emotional State Detection From Touch-based Behavior On Touch-screen Devices
9	Research On Facial Expression Recognition In Simplified Mode
10	Research On Model Of Human-human Interaction Interface And Its Applications