Kinematic measurement and feature sets for automatic speech recognition

Posted on:2002-07-01

Degree:Ph.D

Type:Thesis

University:California Institute of Technology

Candidate:Fain, Daniel Clark

Full Text:PDF

GTID:2468390011992727

Subject:Computer Science

Abstract/Summary:

This thesis examines the use of measured and inferred kinematic information in automatic speech recognition and lipreading, and investigates the relative information content and recognition performance of vowels and consonants. The kinematic information describes the motions of the organs of speech—the articulators. The contributions of this thesis include a new device and set of algorithms for lipreading (their design, construction, implementation, and testing); incorporation of direct articulator-position measurements into a speech recognizer; and reevaluation of some assumptions regarding vowels and consonants.; The motivation for including articulatory information is to improve modeling of coarticulation and reconcile multiple input modalities for lipreading. Coarticulation, a ubiquitous phenomenon, is the process by which speech sounds are modified by preceding and following sounds.; To be useful in practice, a recognizer will have to infer articulatory information from sound, video, or both. Previous work made progress towards recovery of articulation from sound. The present project assumes that such recovery is possible; it examines the advantage of joint acoustic-articulatory representations over acoustic-only. Also reported is an approach to recovery from video in which camera placement (side view, head-mounted) and lighting are chosen to robustly obtain lip-motion information.; Joint acoustic-articulatory recognition experiments were performed using the University of Wisconsin X-ray Microbeam Speech Production Database. Speaker-dependent monophone recognizers, based on hidden Markov models, were tested on paragraphs each lasting about 20 seconds. Results were evaluated at the phone level and tabulated by several classes (vowel, stop, and fricative). Measured articulator coordinates were transformed by principal components analysis, and velocity and acceleration were appended. Concatenating the transformed articulatory information to a standard acoustic (cepstral) representation reduced the error rate by 7.4%, demonstrating across-speaker statistical significance ( p = 0.018). Articulation improved recognition of male speakers more than female, and recognition of vowels more than fricatives or stops.; The analysis of vowels, stops, and fricatives included both the articulatory recognizer of chapter 3 and other recognizers for comparison. The information content of the different classes was also estimated. Previous assumptions about recognition performance are false, and findings of information content require consonants to be defined to include vowel-like sounds.

Keywords/Search Tags:

Recognition, Information, Speech, Kinematic

Related items

1	Key Technology Research On Audio Information Hiding And Information Security Application For Speech Recognition
2	Research On Robust Speech Recognition Method Of Agricultural Market Information Acquisition
3	Anti-noise Technology Combined Denoising Method Based Speech Recognition Studies
4	Chinese Speech Synchronized3D Facial Animation
5	Speech Interaction’s Application In Database Information Retrieval
6	The Realization Of A Simple Speech-dialogue System Based On Google Speech-API
7	Silent Speech Recognition Method Based On High-density S Emg Information
8	The Research On Information Hiding Methods Based On DNN Speech Recognition Model
9	Study On The Key Technology Of The Speech Recognition And It’s Improved Algorithm
10	Research On Continuous Speech Keyword Recognition Based On Time Domain And Frequency Domain