Font Size: a A A

Single Channel Speech Enhancement Used For Mobile Communication

Posted on:2015-05-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:B Y XiaFull Text:PDF
GTID:1228330452453227Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In the recent years, single channel speech enhancement has been widely used inthe mobile speech communication system. While in complex noise environments, theperformance of the state-of-art methods could not fulfill the requirements ofreal-world applications.The research of this dissertation is focused on the improvement of noiseestimation method, the integration of speech enhancement methods, the application ofartificial neural network in speech enhancement, and the compressed domain speechenhancement which is adopted in the network equipment of mobile communicationsystem. The main research contributions are summarized as follows:1. In order to improve the tracking ability of noise estimation when the suddenchange of noise intensity occurs, a noise estimation acceleration method isproposed based on Minima Controlled Recursive Averaging (MCRA). First,the burst detection of power spectrum is performed. If a sudden change isdetected, a hangover period with adaptive length is activated. Then, in thehangover period, the multi-parameter Voice Activity Detection (VAD) isutilized to detect the presence of speech. Finally, the update decision of noiseestimation is made with the assistance of the ratio between noise estimationand power spectrum minimum. The test results under ITU-T G.160show that,the acceleration method has no effect on the performance of speechenhancement when noise level is stationary. When the noise level changesabruptly, the convergence time of noise reduction is reduced obviously, andthe musical noise in the convergence period is removed effectively.2. In order to integrate the characteristics of different speech enhancementmethods, a wavelet fusion method for speech enhancement is proposed. Thenoisy speech is first decomposed into several sub-bands with bi-orthogonalwavelet packet transform. Then, the Weighted Euclidean Distortion Measure(WEDM) spectral amplitude estimator and the over-subtraction type waveletthresholding are adopted in each sub-band. Next, using the fusion rule basedon the cross-correlation and the a priori SNR, the output waveletcoefficients of these two methods are combined in each sub-band. Finally,the enhanced speech is obtained by inverse wavelet packet transform. Thetest under ITU-T G.160shows that, the proposed method could obtain better speech quality than the reference methods.3. Through the introduction of weighted reconstruction loss function into theconventional Denoising Auto-encoder, Weighted Denoising Auto-encoder(WDA) is proposed and employed to describe the relationships between thepower spectra of clean and noisy speech. A Wiener filtering based speechenhancement method with WDA and noise classification is proposed. TheWDA is first employed to estimate the clean power spectrum. Then, the aPosteriori SNR Controlled Recursive Averaging (PCRA) approach is used toestimate the a priori SNR. Finally, the enhanced speech is obtained byWiener filtering in frequency domain. Also, Gaussian Mixture Model (GMM)based online noise classification is introduced to make the proposed methodsuitable for various noise environments. From the test results under ITU-TG.160, in comparison with the conventional frequency domain Wiener filter,the WDA based methods could achieve better objective speech quality, nomatter whether the noise conditions are included in the training set or not.4. Based on the bit-stream of ITU-T G.722.2codec, through the modification ofcodebook gains, a compressed domain speech enhancement method that iscompatible with the discontinuous transmission (DTX) mode and frameerasure condition is proposed. In non-DTX mode, VAD and noiseclassification are first carried out in the compressed domain. Then, the noiseintensity is estimated based on the algebraic codebook power, and the apriori SNR is estimated according to the noise type. Next, the codebookgains are jointly modified and re-quantized. For non-speech frames in DTXmode, the logarithmic frame energy is attenuated to remove the noise, whilethe spectral envelope is kept unchanged. When frame erasure occurs, therecovered algebraic codebook gain is exponentially attenuated, and based onthe reconstructed algebraic codebook vector, all the codec parameters arere-quantized. The test results under ITU-T G.160show that, with muchlower computational complexity, better noise reduction, SNR improvement,objective and subjective speech quality are achieved by the proposed methodcomparing with the state-of-art compressed domain methods.
Keywords/Search Tags:Single channel speech enhancement, noise estimation, wavelet fusion, weighted denoising auto-encoder, compressed domain
PDF Full Text Request
Related items