Font Size: a A A

Real Time Digital Speech Signal Processing System: Theory And Applicaions

Posted on:2005-03-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:W G ChenFull Text:PDF
GTID:1118360185987850Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
This dissertation focuses on the voice activity detection problem under adverse background noise. Nowadays the VAD is a hot spot in the speech signal processing field. In company with the ever growing demand of upgraded communication technology, speech signal processing technology grew rapidly in the latest decades. IP phone has already come into massive application, and is becoming the secrete-weapon of the communication company in their competition, based on its higher performance-price ratio. In terms of the speech recognition technology, speaker independent large vocabulary continuous speech recognition has been realized, and is fighting into way from laboratory to industrial application. Speaker identification technology has appeared in industrial applications as a next generation security technology. Many traditional single simplex communication devices now can act as a duplex system under the help of voice activated module, which is actual voice activity detector. Among all the above various technologies, the VAD is a core signal processing unit, and actually is the bottleneck those technology in the process of transferring to the mature mass applications.In the real communication environment, the background noise may come with totally different types and magnitude. They show different characteristics in time and frequency domain and share no statistical features. In terms of speech signal, it's also a time-varying complex signal. Different languages and different phonemes (the basic pronunciation unit in speech) are totally different. All in all, the actual noise signals the speech signal are both very complex. If no specific application background is assumed, it's nearly impossible to build a discrimination function to distinguish speech from background noise with current VAD technologies.Speech signal processing is a cross-discipline technology that includes traditional digital signal processing, statistical signal processing, system identification and modeling, acoustics and linguistics. In this dissertation, I started my research from the mechanism of speech generate system of human being. Then I studied the voiced and unvoiced phonemes extensively based on the work by previous researchers. The Source-Filter model is a very popular tool in the speech signal processing area. When modeling the voiced and unvoiced signal, I found the LPC spectrum shows robust characteristics, especially in the low SNR environments, compared with other VAD algorithms. Speech signal possesses a short time stationary feature. To be more concise, speech signal is stationary within a very short time of period, say 20-30 ms, but is chaotic from a longer time point of view. This is a relatively unique. The LPC peaks are a good indication of the existence of speech signal, provided that we can track and judge they are speech formants. Based on the assumption, I put forward a new VAD algorithm, called VAD based on phoneme formant tracking. Lately I compared my algorithm performance with the ITU G729B algorithms. It showed that the new algorithm outperformed the G729B with a rough ratio of 20% in adverse noise environment. The new algorithm is more robust against background noise.
Keywords/Search Tags:VAD, system modeling, AR model, formant tracking
PDF Full Text Request
Related items