Font Size: a A A

Acoustic-feature-based frequency warping for speaker normalization

Posted on:2000-03-16Degree:Ph.DType:Dissertation
University:Carnegie Mellon UniversityCandidate:Gouvea, Evandro BacciFull Text:PDF
GTID:1468390014465477Subject:Engineering
Abstract/Summary:
Speaker-dependent automatic speech recognition systems are known to outperform speaker-independent systems when enough data are available for training to overcome the variability of acoustical properties among speakers. Speaker normalization techniques modify the spectral representation of incoming speech waveforms, in an attempt to reduce variability between speakers.;In this work we study the possible benefits of the use of acoustic features that are believed to be key to speech perception in speaker normalization algorithms using frequency warping. We study the extent to which the use of such features, including specifically the first three formant frequencies, can improve recognition accuracy and reduce computational complexity for speaker normalization compared to conventional techniques. We examine the characteristics and limitations of several types of feature sets and warping functions as we compare to their performance relative to that of existing algorithms.;We have found that the specific shape of the warping function appears to be irrelevant in terms of improvement in recognition accuracy. The use of a linear function, the simplest choice, allowed us to employ linear regression to define which features to use and how to weigh them. We present a method that finds the optimal set of weights for a set of speakers given the slope of the best warping function. Selection of a limited subset of features for use is a special case of this method where the weights are restricted to one or zero.;The application of our speaker normalization algorithm on the ARPA Resource Management task resulted in sizable improvements compared to previous techniques. Speaker normalization applied to the ARPA Wall Street Journal (WSJ) and Broadcast News (Hub 4) tasks resulted in more modest improvements. We have investigated the possible causes of this. Our experiments indicate that normalization is less effective with a larger number of speakers presumably because in this case the output probability densities of HMMs tend to be broader and hence representative of a large class of speakers. In addition to this, increasing the vocabulary size tends to increase the search space, causing correct hypotheses to be replaced by errorful ones. The benefit brought about by normalization is thus diluted.;While a number of recent successful speaker normalization algorithms have incorporated speaker-specific frequency warping to the initial signal processing, these algorithms do not make extensive use of acoustic features contained in the incoming speech.;The amount of improvement provided by normalization also increases with increasing sentence duration in Hub 4. Since the actual Hub 4 contains a large number of short segments, the normalization provides a more limited improvement in performance.
Keywords/Search Tags:Normalization, Speaker, Frequency warping, Speech
Related items