Font Size: a A A

Classification and recognition of speech under perceptual stress using neural networks and N-D HMMs

Posted on:1997-12-09Degree:Ph.DType:Dissertation
University:Duke UniversityCandidate:Womack, Brian DavidFull Text:PDF
GTID:1468390014484297Subject:Electrical engineering
Abstract/Summary:
The primary contribution of this study is the formulation of a stress classification algorithm. The secondary contribution is the formulation of a multi-dimensional hidden Markov model (N-D HMM) for unified stressed speech classification and recognition. Perceptually induced stress affects a speaker's intention to produce speech due to the presence of emotion, environmental noise (i.e., Lombard effect), or actual task workload. Analysis of articulatory, excitation, and cepstral based features is conducted using a previously established stressed speech database (SUSAS). Targeted feature sets are selected across ten stress conditions (Apache helicopter, Angry, Clear, Fast, Lombard effect, Loud, Slow, Soft, and two workload tasks). Four stress classification approaches are formulated using both neural network and hidden Markov model based systems. Stress classification rates for the neural network based mono-partition non-targeted feature and tri-partition targeted feature algorithms are 56.68% (5 words, 1 speaker) and 91.01% (35 words, 11 speakers) across ten stress conditions for specific application scenarios. The stress classification rate for both the 1-D and N-D HMM across Neutral, Angry, Clear and Lombard effect speech is 57.6%, with the N-D model yielding greater stress score separation. Stress directed speaker independent speech recognition is shown to improve performance over Neutral and multi-style trained speech recognizers by +10.95% and +15.43%. Finally, the N-D HMM is used to unify the stress classification and stress dependent speech recognition tasks. The N-D HMM structure is derived from Markov Random Field theory enabling an explicit sub-phoneme stress classification at the state level. This formulation better integrates perceptually induced stress effects. An improvement of +15.72% is observed for the N-D HMM at 94.41% over the 1-D HMM based stress directed speech recognition system. This is +26.67% better than the Neutral trained 1-D HMM which has a recognition rate of 67.74%. It is suggested that the developed stress classification algorithms are applicable to other speech under stress environments, yielding significant performance gains in speech processing systems due to the incorporation of speaker stress effects.
Keywords/Search Tags:Stress, Speech, N-D HMM, Classification, Recognition, Using, Neural
Related items