Font Size: a A A

Biologically-inspired noise-robust speech recognition for both man and machine

Posted on:2005-12-23Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:Skowronski, Mark DFull Text:PDF
GTID:1458390008483586Subject:Engineering
Abstract/Summary:PDF Full Text Request
The purpose of this dissertation is to investigate biologically inspired techniques for increasing robustness of speech recognition, for both man and machine. This is accomplished in three regimes: the time domain, feature space, and the classifier. The human auditory system is an existence proof for accurate automatic speech recognition and has solved the principal complexities that currently plague machine recognition: conversational speech, noisy environments, and mismatched test/train data.; The three regimes, unique in their relation to biology as well as their role in recognition, demonstrate the efficacy of biologically inspired computation. In the first regime, human speech recognition is improved in the time domain using energy redistribution, a novel algorithm based on psychoacoustic experiments on the relative information density of typical speech across time. In listening experiments, energy redistribution was shown to decrease recognition error in noisy environments by 40% compared to the experiment control. In the second regime, the tradeoff between spectral resolution and local signal-to-noise ratio in the frequency domain is controlled by the novel speech front end called human factor cepstral coefficients (HFCC), created by combining the known relationship between critical bandwidth and frequency of the human auditory system with the filter bank design in a popular speech feature extraction algorithm: mel frequency cepstral coefficients (MFCC). In automatic speech recognition simulations of isolated words in noisy environments, HFCC outperformed MFCC by 7 dB. In the third regime, an emerging area in information processing, based on observations of the chaotic nature of biological sensory systems, is explored. A nonlinear dynamic system, introduced by Walter Freeman and colleagues, models the olfactory sensory system of rabbits and offers an alternative to conventional stochastic models used in automatic speech recognition. In the current dissertation, several critical aspects of Freeman's model are advanced, and the model is applied as an oscillatory network associative memory in static pattern classification experiments. Recognition accuracy of vowel phonemes using Freeman's model compares with optimum performance of a Hamming classifier.
Keywords/Search Tags:Recognition
PDF Full Text Request
Related items