Font Size: a A A

Spectral refinements to speech enhancement

Posted on:2010-07-21Degree:Ph.DType:Dissertation
University:Florida Atlantic UniversityCandidate:Charoenruengkit, WerayuthFull Text:PDF
GTID:1448390002488740Subject:Engineering
Abstract/Summary:
The goal of a speech enhancement algorithm is to remove noise and recover the original signal with as little distortion and residual noise as possible. Most successful real-time algorithms thereof have done in the frequency domain where the frequency-amplitude of clean speech is estimated per short-time frame of the noisy signal. The state-of-the-art short-time spectral amplitude estimator algorithms estimate the clean spectral amplitude in terms of the power spectral density (PSD) function of the noisy signal. The PSD has to be computed from a large ensemble of signal realizations. However, in practice, it may only be estimated from a finite-length sample of a single realization of the signal. Estimation errors introduced by these limitations deviate the solution from the optimal. Various spectral estimation techniques, many with added spectral smoothing, have been investigated for decades to reduce the estimation errors. These algorithms do not address significantly issue on quality of speech as perceived by a human.;This dissertation presents analysis and techniques that offer spectral refinements toward speech enhancement.;We present an analytical framework of the effect of spectral estimate variance on the performance of speech enhancement. We use the variance quality factor (VQF) as a quantitative measure of estimated spectra. We show that reducing the spectral estimator VQF reduces significantly the VQF of the enhanced speech. The Autoregressive Multi-taper (ARMT) spectral estimate is proposed as a low VQF spectral estimator for use in speech enhancement algorithms.;An innovative method of incorporating a speech production model using multi-band excitation is also presented as a technique to emphasize the harmonic components of the glottal speech input. The preconditioning of the noisy estimates by exploiting other avenues of information, such as pitch estimation and the speech production model, effectively increases the localized narrow-band signal-to noise ratio (SNR) of the noisy signal, which is subsequently denoised by the amplitude gain. Combined with voicing structure enhancement, the ARMT spectral estimate delivers enhanced speech with sound clarity desirable to human listeners. The resulting improvements in enhanced speech are observed to be significant with both Objective and Subjective measurement.
Keywords/Search Tags:Speech, Spectral, Signal, VQF
Related items