Font Size: a A A

Estimators of power spectrum and binary mask for improved speech intelligibility

Posted on:2011-07-03Degree:Ph.DType:Dissertation
University:The University of Texas at DallasCandidate:Lu, YangFull Text:PDF
GTID:1448390002961683Subject:Engineering
Abstract/Summary:
This dissertation seeks for different approaches to speech enhancement that could possibly be used for speech intelligibility improvement under adverse conditions. The objective measures that are used to predict the speech intelligibility inspired this research. Instead of predicting intelligibility, the speech features used by these objective measures could also be utilized to estimate the clean speech and may improve the intelligibility of corrupted speech.;The ideal binary mask (IdBM) assumes that the local signal-to-noise ratio (SNR) is known, and if available can be used to restore speech intelligibility. The idea based on the ideal binary masking is partly motivated by the widely used objective measure, the articulation index (AI), which assumes the speech intelligibility depends on the proportion of time the speech signal power exceeds the masker power. Motivated by the ideal binary mask (IdBM), the author proposed various statistical model based method to model the binary masking, and estimate the clean speech by estimating the power spectrum. By assuming a Gaussian model for the speech and noise power spectra and that the noisy observation is equal to the sum of the clean speech and noise power spectra, the proposed maximum a posteriori (MAP) estimator gives the binary masking. The author further proposed the statistical model for the instantaneous SNR, and proposed soft masking incorporating SNR uncertainty. By using these models, the author explored other objective functions to estimate the speech power spectrum and proposed a number of estimators. These estimators were evaluated using speech quality objective measures. All of them were found to significantly improve speech quality.;Inspired by another objective measure for speech intelligibility, i.e., the speech transmission index (STI), the author proposed machine-learning based estimators to estimate the binary masks, based on the amplitude modulation spectrum (AMS). The multi-layer perceptron (MLP) and the Gaussian mixture model (GMM) were used as the basic classifiers. The enhanced speech was tested by normal hearing people, showing improvement of speech intelligibility at -5 and 0 dB global SNR.;Another speech enhancement method that combines the estimators of speech and noise was also proposed. This method approximates the signal-to-residual noise ratio (SNRESI) as the function of the a priori SNR and the gain function. The SNRESI was found as a very important indicator for speech intelligibility improvement. By evaluating the approximated SNRESI, this method optimized the selection between the speech and the noise estimators to improve the output SNR.;To summarize, this dissertation proposed a number of new methods for speech enhancement aiming at speech intelligibility improvement. For normal hearing people, the speech intelligibility improvement is still an extremely difficult problem. For the controlled environment, as shown by the machine learning based method of this dissertation, speech intelligibility can be improved.
Keywords/Search Tags:Speech, Improve, Power, Binary mask, Estimators, Dissertation, Used, SNR
Related items