Font Size: a A A

Research On Speech Enhancement Based On Speech Modeling And Speech Quality Assessment

Posted on:2010-01-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:W YinFull Text:PDF
GTID:1118330332985686Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Depending on the processing manner, current speech enhancement techniques can be categorized into two major classes:the model-free methods and the model-based methods. The model-free techniques are deficient in several aspects compared to the model-based methods. Some model-free techniques need to use two microphones for both noise and speech recordings. This is usually not possible, especially in on-line enhancement applications (e.g. in hearing-aid applications). One major source of the problems associated with model-free methods is the unreasonable assumption for the noise being relatively stationary. The results of model-free speech enhancement methods are usually unsatisfactory when noise characteristics change relatively fast. Further, more model-free techniques (e.g. spectral subtraction method) may introduce the audible "musical"-like artifact acting as signal dependent interference. Model-based methods estimate the clean speech signals in time domain using the statistical characteristic or correlative characteristic of speech signals. It is desirable to avoid the musical-noise problem from the very beginning and some model-based methods perform reasonably well for nonstationary noise.By utilizing dynamic modeling for speech signal and stochastic signal processing approachs, the dissertation discusses several model-based methods, and aim to improve performance of speech enhancement. On the other hand, this thesis explores subjective and objective evaluation for speech The main contents of this thesis are as follows:1. A novel approach to incorporate the masking threshold with subband H∞filtering is proposed for single channel speech enhancement. No statistical assumptions have to be made on the driving process and the observation noise. Subband speech signals are obtained by subband decomposition. Then an iterative H∞filtering scheme is adopted for the estimation of low-order autoregressive (AR) parameters. The masking threshold to each of corresponding subband is introduced to estimate noise. It makes a further improvement over conventional H∞filtering and reduces speech distortion. Simulation results show that the proposed method not only reduces the computational complexity, but also achieves a better performance both in objective and subjective tests.2. While HMM can not explicitly model the different speech energy levels of a phone, typically due to differences in pronunciation and/or different vocalizations of individual speakers. This thesis proposes a unified solution to the aforementioned problems using a parameterization and modeling of speech gains that is incorporated in the HMM framework. Through the introduction of gain variables, energy variation in speech is modeled in a unified framework. Time-invariant parameters of the speech gain models are obtained offline using training data, together with the remainder of the HMM parameters. The time-varying parameters are estimated in an online fashion using the observed noisy speech signal. Speech signal is filtered with the fixed number of H∞filters. The estimated clean speech is obtained from the sun of the weighted filtered outputs. As the IMM (interacting multiple models) algorithm handles the interactions between the parallel filters in an efficient way, enhancement performance is improved without much increase in complexity. The results show that the enhanced method leads to a significant reduction of background noise and has less speech distortion than conventional algorithms.3. Considering the speech signals with color noises, a novel speech enhancement algorithm is proposed based on unscented particle filter (UPF) using a single microphone. It models speech signals and noises with time-varying autoregressive (TVAR) models. Unscented particle filter is applied to estimate the parameters of AR model and filter non-Gaussian noises. Instead of most popular choice of proposal distribution, unscented particle filter uses an unscented Kalman filter (UKF) to generate the importance proposal distribution which allows the particle filter to incorporate the latest observations into a prior updating routine so as to improve estimation performance greatly with fewer particles. Simulation results demonstrate that the proposed algorithm possesses good performance with the presence of color noises.4. We evaluate the performance of several objective measures in terms of predicting the quality of noisy speech enhanced by noise suppression algorithms. The objective measures considered a wide range of distortions introduced by three types of real-world noise by six classes of speech enhancement algorithms.The subjective quality ratings were obtained using the ITU-T P.835 methodology designed to evaluate the quality of enhanced speech along three dimensions:signal distortion, noise distortion, and overall quality. This paper reports the results of the evaluation of correlations of several objective measures with these three subjective rating scales. A new composite objective measure is proposed by combining the individual objective measures using multivariate adaptive regression analysis techniques. The composite objective measure correlates very well with the subjective quality.5. A novel approach to output-based speech quality evaluation based on the Non-uniform Linear Prediction Cepstrum (NLPC) and GMM-HMM is proposed. Firstly, the spectrum warping is achieved by using the Bark Bilinear Transform (BBT) on a uniform frequency grid to generate a grid that incorporates the non-uniform resolution properties of the human ear. To model warped spectrum by Linear Prediction, NLPC is computed. GMM-HMM trained on features extracted from clean speech signals are used to form a model of normative behavior. A measure of consistency between the degraded coefficient vector and the clean coefficient model is obtained. Finally, using a multivariate nonlinear regression model, an objective forecast model is constructed to accomplish the mapping from the subjective Mean Opinion Score (MOS) to the consistency measure. The simulation result indicates that the proposed output-based objective quality measure performs better than that of the ITU-T P.563 standard.
Keywords/Search Tags:speech enhancement, H_∞filter, HMM, speech quality, non-uniform linear prediction cepstrum coefficient, GMM
PDF Full Text Request
Related items