Font Size: a A A

Study On Statistical Model Based Speech Enhancement Algorithms

Posted on:2012-02-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:1118330368478933Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Environmental noise is one of negative factors which widely exist in voice processing equipments such as voice recognition systems, speaker recognition systems. These systems have good performance in speech processing in the absence of environmental noise speech processing. But in actual noisy environment, the performance of speech processing will degrade a lot. Elimination of the background noise in speech is one of the most challenging problems of speech signal processing. The reason is that there are the diversity of the natural environment noise and the complexity of the speech signal itself. Speech enhancement algorithms are also different from the application environments.There are many kinds of classification of speech enhancement algorithms. According to the number of input channels, speech enhancement algorithms are divided into single-channel speech enhancement algorithms, dual-channel speech enhancement algorithms and multi-channel speech enhancement algorithms. On the basis of the signal processing domain, speech enhancement algorithms can be divided into the time domain and frequency domain speech enhancement algorithms. In consideration of the types of the algorithms, working speech enhancement algorithms can be divided into non-adaptive and adaptive speech enhancement algorithms. Single-channel speech enhancement algorithm is mainly used in mobile communications, hearing aids and other occasions. Single-channel systems usually make use of different statistical properties of speech and noise. These algorithms have poor performance in non-stationary noise. Compared with multi-input system, it occupies relatively fewer resources. However, how to get good performance in single-channel speech enhancement algorithms is one of the most difficult problems studied in this area. Recently, many algorithms have been appeared to overcome the above constraints.In order to improve poor performance of speech enhancement algorithms in the transition phase and unvoiced parts caused by fixed-length sub-frame, this paper proposes a speech enhancement algorithm based on voiced and unvoiced separation algorithm. Through the voiced/unvoiced separation algorithm, the voiced and unvoiced speech signals are separated first. Then the spectral amplitude distributions of the voiced and unvoiced speech signals are estimated. The spectral amplitude distribution functions which are more close to the real distribution are found. In the speech signal processing, the speech is enhanced separately according to different spectral distribution functions. Simulation results show that this separation processing algorithm effectively improves the signal to noise ratio, and specially improves the transition and the unvoiced parts of the speech signal.As we know that statistical model based speech enhancement algorithms estimate the clean speech signal from the noisy speech signal. Such algorithms usually require to know exactly joint statistics of clean speech signal and noise signal and an understandable speech distortion measure. If the noise signal in the noisy speech signal is statistically independent, it needs to know exactly the probability distribution of clean speech and the noise signal. But the reality is that we neither know the speech and noise statistics nor have the best distortion measure. So, in the theory, training speech and noise signal statistics are necessary. It is required to have an optimal algorithm to obtain the signal statistical model. Then the statistical model can be used with the current distortion measure in speech enhancement algorithms.Speech enhancement algorithm is generally assumed that the speech signal is statistically independent. The distribution of short-time spectral amplitude is Rayleigh. Many speech enhancement algorithms are committed to find a more accurate model of speech signals to improve speech enhancement algorithms. Some models such as the Gamma distribution, and Laplace distribution and the generalized super-Gaussian distribution have been proved to be better than Rayleigh model. Although the speech enhancement algorithms based on these models have made some progress, a single distribution function can not well approximate the histogram of speech signal. To solve this problem, this paper presents a super-Gaussian mixture model to model the speech signal spectral amplitude. Parameters of the super-Gaussian mixture model are estimated using EM algorithm. This mixture model can be a good approximation of speech spectral amplitude histogram. The mixture model is used in speech enhancement algorithm. The minimum mean square error estimator of Short-time spectrum amplitude of speech signal is derived. Analysis of gain curve of the speech enhancement algorithm shows the super-Gaussian mixture model can improve the speech signal enhanced performance at low energy.However, according to speech generation principle and characteristics of its non-stationary, it is clear that a single distribution is not suit to the entire speech signal. Therefore improved speech enhancement algorithms can not be implemented by using a model to replace another model. We need a more flexible model or model estimation algorithm to adapt to the characteristics of the speech signal itself. Hidden Markov model is a good model estimation algorithm, and widely used in speech recognition problems. Hidden Markov model is also applied to speech enhancement, but is not well developed or only used to model the noise signal. Here under the assumption of speech signals with different characteristics in the different Hidden Markov model state, by training clean speech signal to obtain parameters of the Hidden Markov model should be able to adapt to the characteristics of the speech signal. This paper adds some constraints on the joint probability of a speech frame in the parameter estimation process. This makes the training process to avoid the happening of infinity or zero points and obtains relatively reasonable spectral amplitude distribution. Finally it proposes a speech enhancement algorithm based on this new model. During the speech signal processing by using the statistical model based speech enhancement algorithms, it is unable to exactly determine each frame of the noisy speech signal belongs to which state of hidden Markov model. In this paper an adaptive method is used to choose the most suitable statistical model of spectral amplitude for the noisy speech signal. The proposed speech enhancement algorithm improves SNR of speech signals and overcomes problems of only a single spectral amplitude distribution function used.
Keywords/Search Tags:Speech Enhancement, MMSE, MAP, Hidden Markov model, Super-Gaussian mixture distribution
PDF Full Text Request
Related items