Font Size: a A A

Research On Noisy Speech Enhancement In Transform Domain

Posted on:2009-10-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:S F OuFull Text:PDF
GTID:1118360245463464Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In our lives, speech is often corrupted acoustically by ambient noise which produces aesthetically undesirable effects on the performance of digital voice processor and even diminishes communication system ability to convey information across the interface. Therefore a speech enhancement system is strongly needed whose responsibility is improving the speech quality and ensuring reliability of digital voice communication systems.Depending on the processing manner, speech enhancement algorithms can be divided into two categories: one is in time domain and other is in transform domain. Algorithms in time domain process the noisy speech signals without any other transformation. They directly estimate the clean speech signals in time domain using the stationary or correlative characteristic of speech signals. Transform domain algorithms for speech enhancement lie in the choices made in the different processing stages of enhancement. The first stage consists of the analysis stage in which the signal is transformed in some domain via a transformation (e.g., DFT, DCT, and KLT). The second stage, which is the heart of most algorithms, consists of the suppression stage in which the transformed signal is multiplied by a gain function designed to attenuate the acoustic noise while preserving the speech signal. The last stage is the synthesis stage in which the modified signal is transformed back to the time domain using an inverse transformation. Algorithms in transform domain are found to be better in enhancing noisy speech as compared to that in time domain because of several advantages. The main reason is that the transformation can provide a significantly high energy compaction and reduce the correlation characteristic of clean speech, which means that the estimation methods can process each noisy speech component individually and can make it easier to remove noise embedded in the noisy speech signals. After general presentation of the algorithms for speech signal enhancement, chapter three addresses the problem of the a priori signal-to-noise ratio (SNR) estimation in the DFT domain: The well-known decision-directed (DD) approach drastically limits the level of musical noise, but the estimated a priori SNR is biased since it depends on the speech spectrum estimation in the previous frame. Therefore, the gain function matches the previous frame rather than the current one which degrades the noise reduction performance. The method called two-step noise reduction (TSNR) technique which is proposed by Plapous recently can solve the problem of DD approach. However, the performance of TSNR method depends on the choice of the gain function, and also the estimated a priori SNR can not reduce the residual musical noise to the lowest level. To remove the bias of the two approaches, a modified approach for the a priori SNR estimation with two steps like the TSNR method is proposed. While in the second step of TSNR method, the proposed approach computes directly the square of clean speech component using the estimated a priori SNR of the DD approach, its result is not restricted on the gain function, and thus the drawback of the TSNR method was removed while keeping the advantages of the DD method. Experimental results show the improved performance of the proposed approach under different noisy conditions.In chapter four, after analyzing the good performance of the DCT for noisy speech enhancement compared to that of the DFT and the problem of the classic DCT based speech enhancement algorithms, the statistical correlation characteristic of successive speech components is investigated across time in the DCT domain. It is found that between successive frames there yields a significant correlation across speech components, while the correlation coefficient is cosine-shape distributed along frequency index. Based on the result, a novel algorithm using DCT and Wiener filter is proposed with a single microphone. This algorithm does not rely on any speech signal model and can efficiently attain the optimal estimation of clean speech components using successive noisy speech components and minimum mean square error estimation in DCT domain. Furthermore, it can overcome the disadvantage of independent assumptions in classic methods for speech components. Simulation results demonstrate that the proposed algorithm possesses good performance both in objective and subjective tests with different kinds of noise.Most researches on noisy speech enhancement often have a basic assumption that in transform domain the coefficients of either clean speech or noise signals are all jointly zero mean Gaussian distributed random variables. This Gaussian assumption is motivated by the central limit theorem as these coefficients are just a weighted sum of a large number of the speech samples. In chapter five, however, we show that in DCT domain the Laplacian distribution is more suitable than the conventional Gaussian distribution for DCT coefficients of clean speech, and based on this research, we give the MMSE and ML estimator for speech enhancement employing the Laplacian-Gaussian mixture model proposed by Gazor, which is shown to result in better performance for noise reduction compared to other methods under Gaussian model. In this approach, however, the estimation of the Laplacian factor for clean speech is derived using the noisy speech signal instead of the clean speech, so the resulting Laplacian factor is not accurate because of the interference of noise energy. To further improve the performance, we present two novel approaches for Laplacian factor estimation based on the property of generalized Gaussian distribution model and its shape parameter. The proposed approaches can indirectly attain the estimation of Laplacian factor using its relation with the variance of clean speech components under the Laplacian distribution assumption, while keeping the resulting method simple. The algorithms can not be affected by noise components and give accurate estimations for Laplacian factor. Experimental results show the improved performance of the proposed algorithms compared to that of the original method.In chapter six, we address the problem of speech enhancement with multi-channel in subspace domain. Subspace based speech enhancement method was provided by Ephrim which is an optimal estimator that would minimize the speech distortion subject to the constraint that the residual noise fell below a preset threshold, but the major drawback of single-channel algorithms for noise reduction using subspace is the incurrence of musical noise. Multi-channel methods can give good performance for noisy speech enhancement, but they often need a large number of microphones. To cope with the drawbacks of both classes, combinations of the single and multi-channel techniques was proposed in [116], which is a multi-channel system with a post filter derived from the signal subspace decomposition. The covariance matrices required to design the filter is approximated from data gathered of different microphones. However, this method has a big drawback that it assumes the noise signal to be white. To extend the work to colored noise, chapter six proposes an improved speech enhancement approach based on signal subspace with multi-channel. Through simultaneous diagonalization of the overall covariance matrices of clean speech and noise signal observed by microphone array, the proposed algorithm estimates clean speech signal subspace without any presumption on the stochastic property of noise signal. It does not rely on any signal model and efficiently attains the optimal estimation of speech corrupted by colored noise, which overcomes the disadvantage of original method only suitable for white noise. Simulation results demonstrate that the algorithm possesses good performance both in objective and subjective tests.Subspace technique requires an accurate estimation of eigenvalues and eigenvectors of noisy speech covariance matrix. In common case, the estimation of eigenvalues and eigenvectors can be computed using KLT after the estimation of convariance matrix is made. This process is very time consuming, and also, since speech is not a stationary process the performance of KLT tracking algorithm may be improved by using adaptive subspace tracking algorithms. In chapter seven, we propose to use projection approximation subspace tracking method introduced by Yang, which is an adaptive KLT method that tracks eigenvectors of covariance matrix using RLS algorithm. Then we investigate the probability distribution of speech components as well as the correlation characteristic between the adjacent components in the adaptive KLT domain, and present a new speech model for enhancing noisy speech which takes into account the time correlation between speech components. Based on this model, a novel speech enhancement algorithm using MAP estimation is proposed, which incorporates inter-frame correlation information as a form of joint probability density function into MAP under Gaussian model assumption for speech and noise components. The obtained estimation result keeps simple and avoids deficiency of classic approaches of noisy speech enhancement in the adaptive KLT domain. In experimental simulations with speech signals degraded by various noises, the proposed algorithm shows improved performance for a number of objective and subjective measures.In chapter eight, conclusions of our work are drawn.
Keywords/Search Tags:Speech enhancement, transform domain, minimum mean-square error, a priori signal-to-noise ratio, correlation, Laplacian factor
PDF Full Text Request
Related items