A speech recognition IC with an efficient MFCC extraction algorithm and multi-mixture models | Posted on:2007-03-26 | Degree:Ph.D | Type:Thesis | University:The Chinese University of Hong Kong (Hong Kong) | Candidate:Han, Wei | Full Text:PDF | GTID:2448390005467746 | Subject:Engineering | Abstract/Summary: | | Automatic speech recognition (ASR) by machine has received a great deal of attention in past decades. Speech recognition algorithms based on the Mel frequency cepstrum coefficient (MFCC) and the hidden Markov model (HMM) have a better recognition performance compared with other speech recognition algorithms and are widely used in many applications. In this thesis a speech recognition system with an efficient MFCC extraction algorithm and multi-mixture models is presented. It is composed of two parts: a MFCC feature extractor and a HMM-based speech decoder.; In the conventional MFCC feature extraction algorithm, speech is separated into some short overlapped frames. The existing extraction algorithm requires a lot of computations and is not suitable for hardware implementation. We have developed a hardware efficient MFCC feature extraction algorithm in our work. The new algorithm reduces the computational power by 54% compared to the conventional algorithm with only 1.7% reduction in recognition accuracy.; For the HMM-based decoder of the speech recognition system, it is advantageous to use models with multi mixtures, but with more mixtures the calculation becomes more complicated. Using a table look-up method proposed in this thesis the new design can handle up to 16 states and 8 mixtures. This new design can be easily extended to handle models which have more states and mixtures. We have implemented the new algorithm with an Altera FPGA chip using fix-point calculation and tested the FPGA chip with the speech data from the AURORA 2 database, which is a well known database designed to evaluate the performance of speech recognition algorithms in noisy conditions [27]. The recognition accuracy of the new system is 91.01%. A conventional software recognition system running on PC using 32-bit floating point calculation has a recognition accuracy of 94.65%. | Keywords/Search Tags: | Recognition, Algorithm, Efficient MFCC, Models | | Related items |
| |
|