Font Size: a A A

The Alorithm Of Embedded Continuous Speech Recognition

Posted on:2008-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2178360215983609Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As the development of real time speech recognition and mobile devices, embedded speech recognition has been focused more and more recently. However, there is no doubt that large vocabulary continuous speech recognition has certain gap with practice right now. Those specific domain ASR applications will play important roles in embedded device in the near future and supply more convenience in human life.With the fundament of large vocabulary of continuous speech recognition, it's available to realize a continuous speech recognition system on low resource platform. In this thesis, a medium vocabulary of continuous speech recognition system for some constrained area on embedded devices has been built. It can be utilized in telephone information inquire, speech interactive in mobile devices such as PDA, GPS. With a certain recognition performance, more emphasis will be efficient decoding.Based on large vocabulary of continuous speech recognition system, this smaller vocabulary, limited domain recognition technique can be researched and development ahead for practical. With the aim of about 500 words and 100 lists of continuous speech, consideration of low cost of memory and complexity, a completed system for speech recognition has described.The use of acoustic model adopts context independent (Monophone) with continuous Gaussian Mixtures Models and Tied Mixture HMM Models. Meanwhile, it applied finite states grammar with the form of Deterministic Finite Automata, DFA for the language guide. The prefix pronunciation lexicon tree for the construction of lexicon would reduce a degree of redundant space. The pronunciation lexicon with finite state grammar make up of the whole search network. A traditional Time-Synchronous Viterbi Beam Search with state beam pruning for the 1st pass search to find the best state sequence as the best words sequence. With the result of 1st pass, the decoder for the 2nd pass is Stack decode. Confidence measure of word posterior probability is added to get a rescore whiles the word expansion. This confidence measure could reduce insert errors with strong evidence of the experiments and not add much more computation cost. With a 106*4 test set, the baseline gained a best of 96.65% word accuracy.Experiments with different knowledge resource and decoding parameters shows with an acceptable recognition accuracy the memory and computation cost are reasonable for the embedded platform. There are two main components of decoding complexity with network expansion and likelihood computation. When the resource systems tend to have a fewer, simpler state distributions and more complex and connections network, the output likelihood calculations stands correspondingly less proportion in the whole cost.According to the experiments result above, the embedded platform and system establishment scheme are made as blow. The floating-point digital signal processor as TMS320C6173 has a 225MHz Clock Rates, 16MB of SDRAM, and 2 MB of FLASH. In order to reduce computation, the whole system contains two components. One is offline part for initialization of models and construction of static search network, the other one is online part for recognition with input record, feature extraction and decoding.At last, three fast Gaussian mixture selection algorithms are applied to reduce the likelihood computation cost. It supposed that the last frame could be a good guide for the coming one. One of method which uses the last frame computation result as the Gaussian pruning threshold gains a best experiment result. Among these identical Tied Mixture model, the performance has only 2% decrease in word correct with about 50% reduction for output likelihood computation.
Keywords/Search Tags:Speech recognition, decoding, lexicon, DFA, decoding complexity, memory evaluation, efficient computation for Gaussian Mixture
PDF Full Text Request
Related items