Font Size: a A A

Data -driven methods for extracting features from speech

Posted on:2001-05-04Degree:Ph.DType:Dissertation
University:Oregon Graduate Institute of Science and TechnologyCandidate:Malayath, NarendranathFull Text:PDF
GTID:1468390014451778Subject:Engineering
Abstract/Summary:
Feature extraction plays a major role in any form of pattern recognition. Current feature extraction methods used for automatic speech recognition (ASR) and speaker verification rely mainly on properties of speech production (modeled by all-pole filters) and perception (critical band integration simulated by Mel/Bark filter bank). We propose stochastic methods to design feature extraction methods which are trained to alleviate the unwanted variability present in speech signal. In this dissertation we show that such data-driven methods provide significant advantages over the conventional methods for feature extraction.;In the first part of the dissertation discriminant methods are introduced for extracting spectral features for ASR. Spectral basis functions which preserve phonetic class separability are derived using linear discriminant analysis (LDA). It is observed that the discriminant basis functions analyze the low frequency part of the spectrum with higher resolution than the high frequency part. This trend is consistent with properties of human hearing which are explained using the notion of critical bandwidth and emulated in the current feature extraction modules by Mel/Bark filter bank. The proposed discriminant features are shown to outperform the conventional features in ASR experiments.;The second part of the dissertation introduces data-driven methods for the design of channel normalizing filters for speaker verification. It has been observed that a reasonable verification error can be achieved if the speaker uses the same handset and telephone line for testing. On the other hand if the speaker uses a different telephone handset while testing, the verification error can increase by four to five times. We introduce a data-driven method for designing filters capable of normalizing the variability introduced by different telephone handsets. The design of the filter is based on the estimated second order statistics of handset variability. This filter is applied on the logarithmic energy outputs of Mel spaced filter banks. The effectiveness of the proposed channel normalizing filter in improving speaker verification performance in mismatched conditions is also demonstrated.
Keywords/Search Tags:Methods, Feature, Speech, Speaker verification, Filter
Related items