Font Size: a A A

Research On Speech Enhancement Method Based On Sparse Representation

Posted on:2015-03-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y P ZhaoFull Text:PDF
GTID:1268330428484043Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Speech signal is inevitably degraded by ambient noise in speech communication. Inthe high levels of noise, people can’t hear each other’s conversation, but also feel fatigableand upset. So the noisy speech signal should be reduced noise when it is received, which isspeech enhancement technology or noise suppression technology. The purpose of speechenhancement is to improve the quality and intelligibility of degraded speech by reducingnoise efficiently as small as possible distortion and without introducing new noise. Theimprovement of speech quality can reduce the fatigue of listener. The improvement inintelligibility may reduce the distortion of speech signal. Speech enhancement technologyis widely used in speech recognition, speech coding systems. In hands-free device, hearingaid and other areas, the application is increasing. Additionally, it is also playing a more andmore important role in the man-machine dialogue, machine translation, Bluetooth, smarthome. During the past decades of development, there are many mature and efficient speechenhancement algorithms that can be broadly divided into four kinds of category: spectralsubtraction-based methods, statistical model-based algorithms, signal subspace-basedalgorithms and Wiener-filtering type methods.The speech signal is reduced the correlation and most of energy concentrates in lowfrequency, so most of speech enhancement algorithms are realized in short-time Fouriertransform domain. However, in some applications, such as in speech coding, optimalpower spectral density estimator might obtain better performance than that of amplitudespectrum estimator. Power spectrum subtraction method and magnitude-squared spectrumestimator are based on the assumption that the magnitude-square spectrum of the noisyspeech signal can be expressed as the sum of the clean speech and noisemagnitude-squared spectra, which is the approximation of power spectrum. Based on theassumption, we proposed a speech enhancement method based on the sparse representationof power spectrum. Sparse representation is the most compact representation that accountsfor most or all information of a signal in terms of a linear combination of only a smallnumber of atoms from an overcomplete dictionary, which can use the techniques fromnon-negative matrix factorization or compressed sensing to find the sparsest possible linearcombination. We use the approximation K-singular value decomposition (K-SVD)algorithm with nonnegative constraint to train the power spectrum dictionary of the cleanspeech and least angle regression (LARS) method to obtain the sparse representation of theclean power spectrum. The reconstructed power spectrum is used to signal subspaceapproach based on short-time spectral amplitude (SSB-STSA) estimator, and then theenhanced speech signal is obtained by combining the noisy phase and the inverse discrete Fourier transform. The termination rule of LARS algorithm is based on the reasonableparameter depended on the estimated noise power spectrum. If thel2norm of thedifference between the noisy and reconstructed speech power spectrum is less than theparameter, the iteration of the algorithm is terminated. Because the noise power spectrumis estimated by the decision-directed method in the beginning of the noisy speech, theproposed method only obtains the better performance in white noise environment.The cross term between the clean speech and noise spectra is not zero, so theassumption is inaccurate that the power spectrum of the noisy speech signal is the sum ofthe clean speech power spectrum and noise power spectrum. The cross term is estimated bythe vector relationships among the spectra of noisy speech, clean speech and noise incomplex plane, which is the function of the instantaneous versions of a priori and aposteriori signal-to-noise ratio (SNR). In this paper, we propose a new speech enhancementmethod based on the sparse presentation of the power spectrum using above speech model.We use the minima controlled recursive averaging (MCRA) method to estimate the noisepower spectrum. The2norm of the sum of the cross term and noise power spectrum isused as the termination rule of LARS algorithm. Then the sparse representation of theclean speech power spectrum is obtained. The dictionary is still trained by approximationK-SVD method with non negative constraint. Additionally, we present a new estimation ofthe instantaneous SNR through the speech power spectrum of current frame other than thatof previous frame. Since speech signal is time-varying between previous frame and currentframe, it is important for speech enhancement that the instantaneous SNR is estimated bythe speech power spectrum of current frame. Since the proposed method uses the morereasonable speech model and termination rule, it adapts to most of the noise environment,and obtains the better performance in low SNR condition.Most of speech enhancement methods are implemented using gain function infrequency domain, which need to estimate the speech power spectrum and noise powerspectrum simultaneously. This means that the performance of speech enhancement systemis partly decided by the accuracy of the estimate of the noise power spectrum. The noisepower spectrum is estimated traditionally by the beginning part or silence segments ofnoisy signal, which are detected by voice activity detector (VAD) method. The detectionresult is only well in stationary noise scenario, but there is more error in low SNR. Innonstationary noise environments, the power spectrum changes rapidly, so the estimationshould be updated as soon as possible. Using an overestimate or an underestimate of thetrue noise power spectrum will lead to reduce intelligibility or produce musical noise. Onthe base of the unbiased minimum mean-square error (MMSE) noise power estimationmethod with low complexity and low tracking delay, a noise power spectrum estimationmethod based on speech presence probability is proposed. Using the magnitude-squaredspectrum model, the new method updates the noise power spectrum estimation using the posteriori speech presence probability decided by a posteriori signal to noise ratiouncertainty. The maximum value of estimated power spectrum is closed to that of theunbiased estimation method and the lowest value is improved. It can estimate effectivelybackground noise without introducing speech distortion. The new method has thecharacters of tracking noise power spectrum accurately and following abrupt changesquickly in the noise spectrum. The quality of the enhanced speech signal is improved in acertain extent in stationary and nonstationary noise scenarios.It is well known that the ear doesn’t have any preference among the changes of thephase or changes in the relative phase in sinusoidal signals. However some researchersbelieve that the rapid fluctuation in the relative phase in the sinusoidal components of aspeech signal degrades the speech quality and the phase of a signal contains lots ofinformation. However the speech enhancement methods based on the amplitude spectrumignore the phase spectrum, which estimate the amplitude spectrum based on theassumption that the phase information can’t improve speech quality. Nowadays, more andmore researchers pay attention to the importance of the phase for speech enhancement. Wepropose an estimation of phase based on the MMSE spectral amplitude estimation giventhe phase. The specific expression of the phase difference is derived employing theinstantaneous versions of the a priori and a posteriori SNR and then the phase of the cleanspeech is estimated using the inverse of the cosine function and noisy phase. It is thecomplement and expansion of the algorithm based on the MMSE spectral amplitudeestimation given the phase. Moreover the proposed phase estimation method combinedwith other amplitude spectrum estimators can improve the quality of the enhanced speech.
Keywords/Search Tags:Speech enhancement, Sparse representation, Dictionary training, Noiseestimation, Phase estimation
PDF Full Text Request
Related items