Font Size: a A A

Multisensor segmentation-based noise suppression for intelligibility improvement in MELP coders

Posted on:2007-08-26Degree:Ph.DType:Thesis
University:Georgia Institute of TechnologyCandidate:Demiroglu, CenkFull Text:PDF
GTID:2458390005481559Subject:Engineering
Abstract/Summary:
This thesis investigates the use of an auxiliary sensor, the GEMS device, for improving the quality of noisy speech and designing noise preprocessors to MELP speech coders. Use of auxiliary sensors for noise-robust ASR applications is also investigated to develop speech enhancement algorithms that use acoustic-phonetic properties of the speech signal.; A Bayesian risk minimization framework is developed that can incorporate the acoustic-phonetic properties of speech sounds and knowledge of human auditory perception into the speech enhancement framework. Two noise suppression systems are presented using the ideas developed in the mathematical framework. In the first system, an aharmonic comb filter is proposed for voiced speech where low-energy frequencies are severely suppressed while high-energy frequencies are suppressed mildly. The proposed system outperformed an MMSE estimator in subjective listening tests and DRT intelligibility test for MELP-coded noisy speech. The effect of aharmonic comb filtering on the linear predictive coding (LPC) parameters is analyzed using a missing data approach. Suppressing the low-energy frequencies without any modification of the high-energy frequencies is shown to improve the LPC spectrum using the Itakura-Saito distance measure.; The second system combines the aharmonic comb filter with the acoustic-phonetic properties of speech to improve the intelligibility of the MELP-coded noisy speech. Noisy speech signal is segmented into broad level sound classes using a multi-sensor automatic segmentation/classification tool, and each sound class is enhanced differently based on its acoustic-phonetic properties. The proposed system is shown to outperform both the MELPe noise preprocessor and the aharmonic comb filter in intelligibility tests when used in concatenation with the MELP coder.; Since the second noise suppression system uses an automatic segmentation/classification algorithm, exploiting the GEMS signal in an automatic segmentation/classification task is also addressed using an ASR approach. Current ASR engines can segment and classify speech utterances in a single pass; however, they are sensitive to ambient noise. Features that are extracted from the GEMS signal can be fused with the noisy MFCC features to improve the noise-robustness of the ASR system. In the first phase, a voicing feature is extracted from the clean speech signal and fused with the MFCC features. The actual GEMS signal could not be used in this phase because of insufficient sensor data to train the ASR system. (Abstract shortened by UMI.)...
Keywords/Search Tags:Speech, Noise suppression, ASR, MELP, GEMS, System, Intelligibility, Aharmonic comb filter
Related items