Font Size: a A A

Speech Enhancement Algorithm Based On Perceptually Modified Short-term Spectral Magnitude Estimation

Posted on:2008-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2178360212996379Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
When the speech signal processing systems communicate or store speech signals, it is usually assumed that it works in noise free environment but in real-world environment, the presence of background interference in the form of additive background and channel noise drastically degrades the performance of these systems. The goal of speech enhancement (or called noise cancelling) techniques is to improve kinds of qualities of enhanced speech signals.Depending on the available numbers of speech signals, speech enhancement algorithms can be classified into two groups: single channel speech enhancement algorithms and multiple channel speech enhancement algorithms. When speech signal and noise exist in one single channel (so called signal channel speech enhancement) is the most common situation, and it is one of the most difficult situations in speech enhancement, since no reference signal of the noise is available, and the clean speech cannot be processed prior to being affected by the noise. The performance of single channel systems is usually limited because they tend to improve the quality of the noisy signal at the expense of some intelligibility loss.Single channel enhancement systems can be broadly divided into four categories: suppression of noise using the periodicity of the speech or the noise; model based speech enhancement; short-time spectral amplitude estimation based speech enhancement and enhancement based on Perceptual criteria, where short-time spectral amplitude estimation based speech enhancement algorithms are developed broadly.The MMSE spectral magnitude estimator is introduced in the second part of this paper. The squared-error cost function might not be subjectively meaningful, however, in that small and large squared estimation errors might not necessarily correspond to good and poor speech quality respectively. And, the squared error criterion might not necessarily produce estimators that preserve spectral peak information or estimators that take into account auditory masking effects. The squared error cost function treats positive and negative estimation errors the same way. But the perceptual effect of positive error and negative error is not the same in speech enhancement applications.To overcome the above problems and shortcomings of the squared-error cost function, the third part introduced Bayesian estimator of the short-time spectral magnitude of speech based on perceptually motivated distortion measures. Speech distortion measures, which have been applied successfully in speech recognition applications, have been shown to be subjectively more meaningful than the squared error measure. Bayesian estimators are derived that take into account auditory masking effects. That is, auditory system can not distinguish noise signal from the high-energy signal. Take use of this character, Bayesian estimators are derived that place more emphasis on spectral peaks than on spectral valleys. And the perceptually-weighted error criterion is implemented by weighting the error spectrum with the inverse spectrum of the original signal. By doing this, the spectrum peak will not be processed in the same way with the spectrum valley. This Bayesian estimator performs better than the MMSE spectral magnitude estimator, but need more calculations.Most classical speech enhancement techniques need two parameters: priori SNR and posteriori SNR. The estimation of priori SNR can affects the performance of enhancement system. Most speech enhancement algorithms use the decision-directed approach to estimate priori SNR. Decision-directed approach is widely used because it is easy to evaluate, and it has reduced the musical noise to the accepted degree. The delay inherent to the decision-directed approach is a drawback especially in the speech transients. This delay introduces a bias in gain estimation which limits noise reduction performance and generates an annoying reverberation effect.The third part of this paper introduces a two step approach to estimate priori SNR. At the first step, estimate priori SNR of the current frame. At the second step, estimate priori SNR of the next frame and regard it as priori SNR of current frame for the delay. The two step approach reserved most advantages of decision-directed approach (e.g. highly reduced musical noise) and successfully remove the annoying reverberation effect of decision-directed approach.The principle of speech enhancement algorithms based on short-time spectral amplitude estimation is divided the noisy speech into frames, and transform them to spectral domain. Clean signal is estimated by applying a gain function to the noisy DFT coefficients. Using fixed frame size, the variance can be large when stationary speech regions is longer than the frame size, and when the assumedstationary speech segment is shorter than the frame size, smoothing is applied across stationary boundaries resulting in a degradation of the speech intelligibility.An adaptive speech segment approach is given in the fifth part. This approach put each frame into a responding segment which can be seen as stationary. The segment is based on statistics of speech and noise signals. If noise source is completely stationary, it will depend on the stationary of speech signal. We use the segment approach based on statistics theory to estimate spectrum of noisy signal. This adaptive speech segment approach improves SNR efficiently when speech change abruptly, and overcome the drawback of degradation of the speech intelligibility across stationary boundaries.
Keywords/Search Tags:Speech Enhancement, SNR Estimator, Adaptive Speech Segment, MMSE, Auditory Masking Effects
PDF Full Text Request
Related items