Font Size: a A A

Towards natural child-computer interaction: Recognizing spoken communicative styles

Posted on:2007-09-22Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Yildirim, SerdarFull Text:PDF
GTID:2458390005483427Subject:Engineering
Abstract/Summary:
The need for automatic recognition of user's communicative styles within a spoken dialog system framework has received increased attention with demand for computer interfaces that provide natural spoken interaction. This thesis addresses the design of automatic communicative-style recognition systems for computer interfaces that target children users. The specific focus is on recognizing two communicative aspects: (1) recognizing a child's emotional state during an interaction, and (2) detection of disfluencies in spontaneous speech of children. Knowledge about a child's emotional state and meta linguistic events such as disfluency helps system to understand child's communicative intent so that the interaction will be successful and more natural. We adopt a data-driven approach. An important requirement of most data-driven processing systems is the availability of transcribed and annotated data. This thesis also addresses the problems of multimodal data collection and annotation in the context of child's verbal and non-verbal interactions with the spoken dialog agent.; The first study deals with automatically detecting frustration and politeness attitudes from the child's speech communication cues and examines their differences as a function of age and gender. Several information sources such as acoustic, lexical, and discourse features as well as their combinations are used for this purpose. Results show that discourse and acoustic information have more discriminative power than language information for detection of frustration whereas language information is more discriminative for politeness. Results also point out the effects of age and gender on recognition performance.; The second study analyzes disfluencies in children's spontaneous speech in the context of spoken dialog based computer game play and addresses the automatic detection of disfluency boundaries by means of audio-visual information. Visual cues are obtained directly from the video sequence by using an optical flow technique in which the motion properties were estimated based on motion intensity changes between frames. The proposed algorithm improves the performance and robustness of the detection system over conventional approaches by utilizing visual information along with acoustic and language information.
Keywords/Search Tags:Spoken, Communicative, Information, System, Interaction, Computer, Recognizing, Natural
Related items