Font Size: a A A

Computations and evaluations of an optimal feature-set for an HMM-based recognizer

Posted on:1997-04-14Degree:Ph.DType:Thesis
University:Brown UniversityCandidate:Mashao, Daniel JohannesFull Text:PDF
GTID:2468390014483471Subject:Engineering
Abstract/Summary:
The benefits of a speech recognition machine would be many, resulting in the improvement of the quality of life for people. The design of a speech recognition system can be divided into two parts, commonly known as the front-end and back-end. The front-end deals with the conversion of the analog speech signal into features for classification. This thesis investigates optimal feature-sets for speech recognition. The objectives for an optimal feature-set are improved recognition performance, noise robustness, talker insensitivity and efficiency. Three problems that make it difficult to find optimal features are: (1) the amount of resources (time and computations) required to evaluate the performance of a feature-set, (2) the size of the feature space, and (3) the dependence of features upon some words in the vocabulary.;This thesis proposes solutions to all three problems. The evaluation problem is addressed by designing an advanced architecture. The architecture reconfigures itself for fast computation based on the source code and takes advantage of the structure of the semi-continuous hidden Markov model computations. This thesis demonstrates how an inexpensive reconfigurable system outperforms a fast general purpose computer. The feature space problem is addressed by investigating discrete Fourier transform (DFT) based feature-sets. Two parameters are used to control the spectral compression of the features. The parameterized feature-set with a mel-scale compression are shown to be superior. The parameterized system decreased the error rate of the standard mel-cepstrum LPC system by over 21% to 8.2%. Recognition performance of all highly confusable sets were improved. The DFT-based signal processing increased error rates for confusion of voiced-to-unvoiced stops, but made a good distinction of the place-of-articulation. The decreased error rates on the nasals were expected since the LPC models the spectral zeros poorly. To improve on performance of specific words in the vocabulary, the small but difficult nasal-set is investigated. A hierarchical method improved the performance of the set.;This thesis, perhaps for the first time, has shown that mel-scale compression of the human-auditory system is also ideal for machine speech recognition. The reconfigurable architecture will enable further investigations of complex parameterization of the feature space.
Keywords/Search Tags:Speech recognition, Feature, Optimal, Computations
Related items