Font Size: a A A

Speech Enhancement Under Condition Of Low SNR

Posted on:2012-10-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z TaoFull Text:PDF
GTID:1228330368491177Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech is the fastest, easiest and most important and effective way to exchange information between human beings. However, the process of speech application in a natural environment, a process that includes speech recognition, speech coding, speech conversion and speech communication, will inevitably be affected by all kinds of noises that emanate from the surroundings. Noise seriously affects the performances of any technologies used to recognize speech and may even cause these systems to fail. Speech enhancement is an effective way to solve such noise pollution. The aim of speech enhancement is mainly to extract as pure a speech signal from noisy speech as possible by suppressing background noise and improving the clarity and intelligibility of the actual speech. This technology has already been applied to technologies such as hearing aids, cochlear implants, speech communication for blind persons, human-computer interaction systems and mobile speech communications.A variety of speech enhancement technologies have appeared in recent years. These technologies have achieved improved results in speech characterized by high signal to noise ratio (SNR). However, when in the situation where the speech suffers from a low SNR and from a weak voice signal, the enhanced speech will be accompanied by unsuppressed residual noise and by background noise. There is a big speech distortion as well. In this context, we study speech enhancement when the SNR is low. The main work we conduct can be outlined in the following way:1. In the situation of low SNR, we make use of tradition algorithms to conduct speech endpoint detection. As this will typically produce problems related to low accuracy and poor noise performance, we study a speech endpoint detection algorithm based on the instantaneous energy frequency value of the Hilbert-Huang Transform (HHT). This paper applies HHT to separate the instantaneous amplitude and the instantaneous frequency from the speech, and extracts the IEFV, which is a temporal- amplitude- frequency character. Because the IEFV can distinguish speech form noise effectively, it is used as the feature for endpoint detection. As shown in the experiments, the accurate rates of both the initial and the final of this algorithm are higher than the rates produced by the Zero-Energy-Product method or the Spectral Entropy method and so on.2. We propose a noise power spectrum estimation algorithm based on variance constraint spectral smoothing and a min-lifting scheme. According to the ratio between the smoothing power spectrum of the noisy speech sun-band and its minimum, this method adjusts the time-frequency smoothing parameter based on the probability of speech in the noisy speech sun-band. It estimates the noise spectrum in line with the weighted noisy speech power spectrum. Furthermore, it can also smooth the noise spectrum through the variance of the noisy speech smoothing power spectrum. Since the algorithm can update the noise power spectrum of a very short speech gap, it primarily improves the adaptive speed. Experiment results indicate that the estimated noise spectrum not only quickly adapts to changes in the background noise, but also guarantees the accuracy of the noise spectrum estimation. These performances have been significantly improved, especially in the cases of strong background noise and of slow changing noise.3. In order to protect the unvoiced components of speech, we propose a speech enhancement procedure that makes use of auditory perception wavelet transform processed by adaptive thresholding. This method decomposes the noisy speech by using the auditory perception wavelet transform. Thus, we get the sub-band layer coefficient of the wavelet auditory perception. By combining it with the voicing decision for the voiced speech, we are able to use the auditory masking method to remove the noise. As for unvoiced speech, in order to avoid the over-thresholding of the wavelet coefficients, we process the threshold by making use of modified soft thresholding. It is shown in the experiment that this method is an excellent way to solve the inconsistency between the protection of the speech signal and the removal of noise. The unvoiced speech has thus been well preserved while the noise has been suppressed.4. We propose to perform speech enhancement using a combined auditory neural model and a quantum auditory neural network. When noisy speech with a low SNR goes through the auditory nerve model, it will be enhanced into speech with a high SNR. Then, we make use of the nonlinear mapping and self-learning ability of the auditory neural network. We regard the extracted parameters of each frame speech signal as the input of the quantum auditory neural network. We can also make use of the auditory neural network to optimize the subtract parameter, and the output of the neural network acts as the subtract parameter of each frame speech signal. Thus, we can achieve speech enhancement after estimating the speech gain by making use of these optimized subtract parameters. Experiment results indicate that this method reduces the distortion of the target speech signal using the adaptive self-learning ability of the neural network, while obtaining significant improvement both subjectively and objectively.5. As is well known, the whisper is a kind of weak speech signal with a low SNR. Using algorithms like spectral subtraction to remove noise from whispers will easily produce an annoying "musical noise". This paper proposes a de-noising method based on a modified Mel masking model and the speech absence probability in noisy whispers. A method of whispered speech enhancement using the auditory masking model in the modified Mel-domain along with the Speech Absence Probability (SAP) is proposed. In light of the phonation characteristic of whispered speech, we modify the Mel-Frequency Scaling model. Whispered speech is filtered by the proposed model. Meanwhile, the value of the masking threshold for each frequency band is dynamically determined by the speech absence probability. Then, the noisy whisper is de-noised by adaptively rectifying the spectrum subtraction coefficient using different masking threshold values. It is shown in an experiment that, compared with other spectral subtractions, this method can control the residual noise and the background noise and keep it below the human ears’ masking threshold. The method also produces smaller speech distortion and thus constitutes a large improvement in subjective hearing.
Keywords/Search Tags:speech enhancement, low SNR, variance constraint spectral smoothing, auditory perception wavelet transform, quantum auditory neural network
PDF Full Text Request
Related items