Font Size: a A A

Robust automatic speech recognition algorithms for dealing with noise and accent

Posted on:2010-11-19Degree:Ph.DType:Dissertation
University:University of California, Los AngelesCandidate:You, HongFull Text:PDF
GTID:1448390002978913Subject:Statistics
Abstract/Summary:
Although there has been significant progress in automatic speech recognition (ASR) systems over the past five decades, many challenging problems still remain. In addition to the intrinsic confusability between speech units, the environment, speaker, and speaking styles all contribute to variations in speech signals, which pose one of the most challenging issues facing ASR research. Variability in speech requires both the signal processing and pattern modeling components of an ASR system to adapt. The focus of this dissertation is on developing algorithms that improve the performance of speech recognition systems when dealing with variability in speech signals. Specifically, the focus is on variabilities due to environmental noise and to accent. For example, environmental noise contributes to significant speech variability that depends on the type of noise and signal-to-noise ratio (SNR). A noise robust feature extraction technique is necessary in order for ASR to deal with noisy speech. Variations in speech signals due to certain pronunciations can also result in degraded ASR performance. An ASR system needs to compensate for these pronunciation variations.;In terms of noise robustness, we explore feature extraction and frame selection algorithms that can enhance the signal processing component to handle variability caused by noise. Algorithms are then tested on several bench-mark databases to compare their performance with state-of-the-art noise robust ASR systems. Improved recognition accuracy is observed. In terms of speaker accent robustness, we focus on pronunciation modeling. We propose algorithms to analyze pronunciation variations for Spanish-accented speech at the pronunciation lexical level. Since speech recognition systems rely heavily on Hidden Markov Models (HMM), a confusability measure for HMMs is important. We propose a distance measurement between HMMs which improves upon existing HMM confusability metrics in terms of ASR performance prediction, confusion pattern prediction and pronunciation modeling.
Keywords/Search Tags:Speech, ASR, Noise, Algorithms, Pronunciation, Robust, Performance, Systems
Related items