Font Size: a A A

Invariant speech recognition and auditory object formation: Neural models and psychophysics

Posted on:1996-10-14Degree:Ph.DType:Dissertation
University:Boston UniversityCandidate:Govindarajan, Krishna KFull Text:PDF
GTID:1468390014988097Subject:Psychology
Abstract/Summary:
This dissertation investigates three topics concerning variability and robustness in speech perception: variability of the speech signal across speakers, variability due ta speaking rate effects, and the robustness of speech perception in noisy environments.;Given that the speech signal corresponding to a given phoneme can vary considerably across speakers, invariant speech perception can be facilitated by normalizing the signal across speakers. In chapter 1, 160 intrinsic and extrinsic speaker normalization methods are compared using a neural network, fuzzy ARTMAP, and K-Nearest Neighbor (K-NN) categorizers trained and tested on disjoint sets of speakers of the Peterson-Barney vowel database. ARTMAP and K-NN show similar trends, with K-NN performing better but requiring about ten times as much memory. The optimal intrinsic normalization method is bark scale using the differences between all frequencies, while the optimal extrinsic method is linear transformation of the vowel space to a canonical representation.;In chapter 2, psychophysical studies of adaptation to the mean silence duration between two different stop consonants are examined. Using natural speech stimuli, the first experiment shows that the category boundary between hearing only one or hearing both stop consonants varied as a function of the distribution of silent intervals. The second experiment shows that the variance of the distribution did not significantly affect the boundary, and the final experiment shows sequential effects in the adaptation process. Finally, a model of the adaptation process is developed which emulates the data.;In environments with multiple sound sources, the auditory system is capable of teasing apart the impinging jumbled signal into different mental objects. Chapter 3 presents a neural network model of auditory scene analysis, which groups different frequency components based on pitch and spatial location cues and allocates the components to different objects. While location primes the grouping mechanism, segregation is based solely on harmonicity. The model qualitatively emulates results from psychophysical grouping experiments, such as how the addition of harmonic components helps a tone sweeping upwards in frequency to overcome grouping due to frequency proximity with a downward sweeping tone at the intersection point; and illusory percepts, such as the illusion of a tone continuing through noise.
Keywords/Search Tags:Speech, Across speakers, Model, Auditory, Neural, Signal
Related items