Font Size: a A A

Reconstruction of incomplete spectrograms for robust speech recognition

Posted on:2001-05-26Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Ramakrishnan, Bhiksha RajFull Text:PDF
GTID:2468390014452207Subject:Engineering
Abstract/Summary:
The performance of automatic speech recognition (ASR) systems degrades greatly when speech is corrupted by noise. Missing feature methods attempt to reduce this degradation by deleting components of a time-frequency representation of speech (such as a spectrogram) that exhibit low signal-to-noise ratio (SNR). Recognition is then performed using only the remaining components of the incomplete spectrogram. These methods have been shown to result in recognition accuracies that are very robust to the effects of additive noise. However, conventional missing feature methods, which modify the classifier used to perform the recognition, suffer from the drawback that they are constrained to use the log-spectral vectors of the spectrogram as features for recognition. It is well known recognition systems that use log-spectral features perform poorly compared to systems that use cepstral features.; In this thesis we propose two new approaches that recast the missing feature paradigm as a data compensation problem, by reconstructing missing elements to obtain complete spectrograms. In the first approach, referred to as cluster-based reconstruction, incoming log-spectral vectors from clean speech are clustered. Missing spectrographic features from noisy data are recovered by first identifying the closest cluster based on the values of the features that are present, and then estimating the missing values using MAP procedures. The second approach, referred to as covariance-based reconstruction, uses MAP procedures to estimate the value of the missing components of the spectrogram based on their correlations with the elements that are present. Both methods take into account the bounds on the clean spectrogram imposed by additive noise. In either case, cepstral features are computed from the reconstructed spectrograms and used for recognition without any modification of the speech recognition system.; When corrupt regions of the spectrogram are known a priori, recognition accuracies resulting from reconstruction methods are seen to be much higher than those obtained with the best current missing feature methods based on modification of the recognition system. The proposed spectrogram reconstruction methods are also computationally less expensive than the best conventional missing feature methods.; We also propose two methods that attempt to identify corrupt regions of the spectrographic representations of incoming speech. The first method utilizes noise spectrum estimates of vector Taylor series (VTS) compensation for noise-corrupted speech, while the second method treats the identification task as a classic Bayesian classification problem. Combination of the best method to identify corrupt regions with the best method to reconstruct them produces recognition accuracies better than any other known algorithm for speech in additive white noise. We also observe significant improvement in recognition accuracy for speech in the presence of background music if the locations of corrupted spectrographic regions are known a priori , but we have been less successful in blind identification of these corrupt regions for these signals.
Keywords/Search Tags:Recognition, Speech, Missing feature methods, Spectrogram, Corrupt regions, Reconstruction, Noise
Related items