Font Size: a A A

Models of human phone transcription in noise based on intelligibility predictors

Posted on:2010-11-21Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Lobdell, Bryce EFull Text:PDF
GTID:1448390002973000Subject:Speech communication
Abstract/Summary:
Evidence exists that speech recognition by humans is simple pattern recognition at some level. It would be worth knowing the level at which human speech perception operates in this manner, and the structure of the pattern recognition system employed. The approach taken here is to collect human classifications for phones (i.e., phone transcription), model the auditory signals these speech sounds will evoke (to the extent possible), then use tools from statistics, pattern recognition, and information theory to constrain a model of human phone transcription.;It can be inferred from statistics that human speech recognition must use a much simpler representation of speech than sampling at full bandwidth. The representation used (1) results from the statistics of speech signals in the auditory periphery, (2) is probably optimal in a sense related to the capabilities of the human pattern recognizer, and (3) has a strong effect on the resulting error rates and patterns exhibited by humans in speech recognition. A wealth of research exists about human behavior in speech recognition tasks. One such investigation, resulting in the Articulation Index method of predicting speech intelligibility, is especially relevant because it models human error rates in the conditions of noise, amplification, and filtering. These conditions are readily analyzed using statistics and information theory. The Articulation Index studies revealed that human error rates for phone transcription are relatively insensitive to changes in amplification, while they are relatively sensitive to changes in the speech-to-noise ratio.;This dependence on the speech-to-noise ratio has been interpreted to imply some things about the human pattern recognizer. This dissertation examines this interpretation more thoroughly using new perceptual data which could not have been available at the time the Articulation Index was developed. Some of the data are phone classifications in two noise spectra and at various noise levels, which permits us to determine the specificity of the Articulation Index predictions. Data about detection of speech in noise is modeled and related to the intelligibility of speech. Another data set allows us to separate some sources of entropy in noisy conditions, which is practically important for design of experiments and theoretically important for describing the human pattern recognizer.;Four model representations of noisy speech in the brain are compared on the basis of the performance exhibited by a pattern recognizer using those representations. The model representations derived from the Articulation Index exhibit an interesting property: they are more robust to mismatches between the testing and training data set.;The key findings of the experiments are the following: (1) the Articulation Index model recognition accuracy works very well in some phonetic contexts and fails in others, (2) the Articulation Index model is the average of a number of more specific models with their own parameters, (3) audibility of speech does not explain all variation but explains a great deal of it, and (4) phonetic importance is not spread uniformly over the time and frequency. We speculate that humans may use different representations of speech, depending on the phonetic context, and we suggest experiments controlling frequency-band specific signal-to-noise ratio and level to resolve these issues.
Keywords/Search Tags:Human, Noise, Phone, Speech, Model, Articulation index, Pattern, Level
Related items