Font Size: a A A

Duration normalization for robust recognition of spontaneous speech via missing feature methods

Posted on:2005-01-03Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Nedel, Jon PFull Text:PDF
GTID:2458390011450750Subject:Engineering
Abstract/Summary:
Accurate recognition of spontaneous speech is one of the most difficult problems in speech recognition today. When speech is produced in a carefully planned manner, automatic speech recognition (ASR) systems are very successful at accurate recognition and transcription. In response to casual speech, ASR systems produce more than twice as many errors compared to recognition of the same speech read carefully.; In this thesis, we have developed a practical algorithm to improve the recognition accuracy of ASR systems when transcribing spontaneous speech. We have found that normalizing the speech features so that every sound unit ("phone") has the same duration allows speech recognition models to characterize and recognize speech more accurately.; ASR systems use hidden Markov models (HMMs) to model the sound units from which speech signals are composed. It is well known that HMMs do not accurately model the average phonetic variation or the variability introduced into these durations by the casual production of speech. By normalizing the duration of every speech sound unit, we are eliminating a source of variability in the modeling of speech that can contribute to increased word recognition errors.; When the boundaries between sound units are known a priori, the duration normalization approach is able to achieve substantial improvements in recognition accuracy. Automatic identification of unknown boundary locations, however, has proven to be a difficult problem. When speech is highly spontaneous, there is often little or no acoustic evidence in the speech signal to indicate transitions from one sound unit to the next. Duration normalization depends on accurate boundary locations, and even our most accurate automatic segmentation technique when applied in isolation is not sufficiently accurate for duration normalization to perform effectively.; Because our efforts to improve automatic segmentation of spontaneous speech have not been very fruitful, we have focused on the development of duration normalization approaches that are more robust to boundary detection errors. We have also explored the use of duration normalization based on probabilistic identification of phone boundaries. Our most effective system makes use of three simple variants of duration normalization and an algorithm that can combine multiple recognition hypotheses into a single best hypothesis. With this multi-pass approach, we have achieved significant improvements in recognition accuracy by applying duration normalization to a variety of spontaneous speech databases, including a large-scale broadcast news corpus. These techniques achieve a relative reduction in word error rate of 3.9%--7.7%, depending on the size and complexity of the recognition task.
Keywords/Search Tags:Recognition, Speech, Duration normalization, ASR systems, Accurate
Related items