Font Size: a A A

Speech Signal Processing Based On Auditory Neural Mechanisms

Posted on:2010-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:W GuoFull Text:PDF
GTID:2178360275970256Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Spoken language is not only an extremely important tool in the everyday communica-tions among humans, but is also one of the greatest feats of the mammalian brain evolutionof the past million years. This highly complex function, which separate the human racefrom the rest of the animal kingdom, involves neural processing of information ranges fromsound to graphical and even to abstract symbolic representations of languages. Despite ourdaily access to this function, the general understanding of'where'and'how'of languageprocessing in our cortical systems is still lacking. In the past few decades, psychologistsand neuroscientists have made a great deal of observations on the mechanisms of humanperipheral auditory systems, early stage brain stems, as well as the auditory cortex. Theseobservations, although cannot readily solve the mystery of spoken language processing, havea great impact of our further exploration in this subject matter.On the other hand, since the emergence of electronic communication and computertechnology, people have been looking at spoken languages from the perspective of digitalsignal processing. The physical attributes of the vibrations caused by speech are picked upby electronic devices and then be processed by computers. The acquisition and process-ing of digital speech signals formed an important branch in the field of communication andelectronic engineering. Speech recognition, the process of transcribing the digital speechsignals to written words, is among the most researched and the most difficult topics in thisarea. However, even though the signal processing technology and computational capabilityof modern computers are increasingly powerful, the digital style of spoken language pro-cessing still cannot compete with the ability of human brains. Speech processing by thehuman brains is far more robust than by computers. The superiority of neural system in this task may suggest that if digital processing of spoken language can mimic the human brain insome fashion, the performance will improve.In this thesis, we simulate several neural processing mechanisms of auditory systemin a algorithmic fashion, and incorporate them in a speech recognition framework to testthese modules. We first use non-negative matrix factorization (NMF) as a basis learningmethod for speech signals, and use these basis as computational models of spectral-temporalreceptive fields (STRF) of neurons in auditory cortex. These neurons can work as a featureextraction system for speech, and our experiments show that these features are much morerobust to noises than conventional features used in speech recognition applications. We alsouse a modified version of NMF, orthogonal non-negative matrix factorization (ONMF) asa tool to extract one of the most important aspect of speech– the fundamental frequency.Experiments also show that this method is robust to noise, and can simultaneously trackmultiple fundamental frequencies, which are superior than many conventional methods.Overall, we present some novel methods in speech signal processing inspired by humanauditory system, which achieve preferable results. These methods still have leeway to makeup performance-wise, since the knowledge about auditory system is still incomplete. Butthese methods work in an interdisciplinary fashion, and may point out the future direction ofboth speech signal processing and neuroscience research.
Keywords/Search Tags:auditory system, speech signal processing, speech recognition, pitchtracking
PDF Full Text Request
Related items