Font Size: a A A

Reseach On Perceptual Cues Of Nasal Consonants

Posted on:2016-03-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:F BaiFull Text:PDF
GTID:1108330473956116Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech sounds are characterized by time-varying spectral patterns called acoustic cues or perceptual cues. When a speech wave propagates on the basilar membrane, unique perceptual cues(also named events), which define the basic units for speech perception, become resolved. The relationship between the acoustic cues and perceptual units has been a key research problem for the interdisciplinary study of speech perception. Solving this problem can be very useful in a wide range of applications. However, due to the variability in natural human speech introduced by different talker under different conditions, this problem becomes very complicated to solve. In addition, the inter-disciplinary nature of this research topic requires the breakthrough and collaborative efforts in multiple research fields such as Mathematics, Physics, Psychology, Physiology and Electrical Engineering, which makes the research even harder to conduct. As a result, the progress in this research topic is very slow. Various hypotheses were raised, leading to debates and contradictory conclusions. In recent decade, the human speech research group in University of Illinois at Urbana-Champaign has brought revolutionary advances in the research of consonant perceptual cues, especially for plosives and fricatives. However, due to the complexity of nasal consonants, their perceptual cues are still at the initial stage of exploration before the work done in this thesis. The available analytical methods are not able to help explain various phenomena in the nasal perceptual experiments; some experiment results are even contradictory. Majority of this thesis was accomplished during the four years and more in this lab for the joint PhD program. By exploiting advanced research philosophy, methodology and experiment facility, and integrating the features of nasal consonant, we achieved satisfying result: identified relatively stable perceptual cues for nasal consonants /m/ and /n/ in natural speeches by different talkers, and investigated their properties related to perception. The main results of this thesis are:1. Variety of speech recognition are studied and summarized, including motor theory, direct realist theory, Fuzzy-Logical Model of Speech Perception, and Fletcher-Allen Model of Speech Perception. During the process, not only the advantages and disadvantages these models are evaluated, but also that the methodologies for searching for perceptual cues and various parameters representing perceptual cues(such as voice onset time) are explored. It is found that many key experiments in the history had used synthetic speeches for controling speech variability. However, synthesizing speech requires prior knowledge of perceptual cues, so it only contains parts of the signal which were already understood by the designers or some parts they desired to include. On the other hand, using natural speech stimuli have certain problems. For example, the selected stimuli may not be representative. As a result, researchers are still having debates over the place of perceptual cues in time-frequency domain for different consonants, as well as the sufficiency and necessity of these cues to the correct perception of consonants.2. Theories and experimental tools relating to perceptual cues are researched, including the cochlear model for speech signal decomposition and compression, masking, confusion matrix, multiband product rule, articulation index model, and visualized intelligible time-frequency computational model AI-gram. Based on Fletcher-Allen model, theories on what the 3-Dimensional Deep Search is built on are investigated. Whithin three independent experiments and combined with the properties of these experimental data, the methodology aim to analyzing nasal consonants is proposed. We sub-divide the nasal consonant sounds into 3 types: non-overlapping pattern, overlapping pattern and critical overlapping pattern, after the classification, we explore and analyze the perceptual cues for each of these types. We use local analysis method to discover whether each part of nasal perception cue is necessary and sufficient for the nasal consonant perception. It is discovered that some nasal perceptual cue regions contain two or three independent parts which allow the subject to identify the target accurately. Besides it, subject can also define such type of perceptual cues as redundant cues. What are discovered above solve many unexplainable and contradictory phenomena of previous studies. In addition, we discuss the consistency and variability of nasal perceptual cues, the perceptual effects of conflicting cue, robustness, as well as the part of the consonant before the necessary duration of the cue. We identify relatively stable perceptual cues under noise conditions by different talkers: the perceptual cue for /n/ lies at the onset of the F2 formant region, in 939-2164 Hz, and the perceptual cue for /m/ lies at the onset of the F2 formant region, in 363-1300 Hz. These results, together with the perceptual cues identified for other types of consonant, make it possible to develop perceptual-cue-based speech signal processing, such as perceptual-cue-based speech compression coding algorithm, speech enhancement algorithm, automatic speech recognition algorithm in noise, etc.3. Digital signal processing methodology is used for modifying perceptual cues of the nasal consonants(including cue amplification, cue attenuation and cue removal), and investigating the effects of such modification. We define the concept of perception curve shift, as the measurement for intelligibility changes. We perform nonlinear regression on the perceptual score as a function of SNR level which is able to generate perception curves, and we use a minimum mean square error(MMSE) calculation to estimate the curve shift. We accessed the effects of cue modification on the correct identification of the speech sound by using statistical analysis from multiple perspectives: ?SNR, 90 SNR, relationship between the amplitude of cue modification and the magnitude of ?SNR, relationship between ?SNR ? and ?SNR ?, relationship between 90 SNR and 90 SNR ?. The important conclusion regarding to the perceptual cue is: by only modifying a small cue region we can achieve the same effect as modifying the whole speech sound. This result not only confirms that perceptual cue carries key information for consonant recognition, but also validates that the perceptual cue regions of the nasal consonants we located by the 3DDS method are correct. Such perceptual-cue-based speech modification methodology will provide an efficient way for speech enhancement under noise condition.4. Investigating nasal perceptual cues by consonant transformation. We present a method for transformation between /m/ and /n/, which solely need to manipulate the perceptual cues. By removing the perceptual cue of /na/, the speech sound can be stably transformed to /ma/, and by enhancing the conflicting cue(i.e. the region contains the cue of /na/) in /ma/, the speech sound can be stably transformed to /na/. The inter-transformation between /ma/ and /na/ provids strong evidence that the nasal cues we identified are correct. At the same time, it reveals a new potential application of perceptual cues-sound transformation.5. The contribution of nasal murmur for perceiving the nasal consonants accurately is studied. We design and perform psychoacoustic studies to investigate this topic. We analyze and explain our experimental data, based on the new visualized intelligible time-frequency computational model AI-gram and the nasal cues we identified. We define information-complementary parts in consonants which is similar to nasal murmur as secondary cues. We explain from the perspective of clarity of primary perceptual cues the conditions under which nasal murmur shows its complimentary role, and claim the important conclusions regarding to perceptual scores: nasal murmur has complementary effects on the primary cue. The representation of such complementary effect on the perceptual scores is correlated with SNR levels, but not determined by the SNR level. Instead, it is determined by the clarity of the primary cues. In other words, even under high SNR, if the primary cue is not that strong, the nasal murmur can still contribute to its perception with its complementary function. The result provide new insights to the cause of different observations by different research groups using different stimuli, which to some extent consolidate the long-standing debates of nasal murmur’s perceptual contribution. By analyzing the effect of nasal murmur on confusion patterns, we conclude that nasal murmur can greatly suppress confusions. We explain the cause of confusion pattern change at a fundamentally level using AI-gram, thus state the contribution of nasal murmur region to the correct recognition of nasal consonant in another perspective.
Keywords/Search Tags:perceptual cue, conflicting perceptual cue, noise masking, nasal consonant recognition, nasal murmur
PDF Full Text Request
Related items