Real-time speaker -independent large vocabulary continuous speech recognition

Posted on:2006-02-28

Degree:Ph.D

Type:Dissertation

University:University of Missouri - Columbia

Candidate:Li, Xiaolong

Full Text:PDF

GTID:1458390008950235

Subject:Computer Science

Abstract/Summary:

In this dissertation, a real-time decoding engine for speaker-independent large vocabulary continuous speech recognition (LVCSR) is presented. An overview is first given covering the state-of-the-art decoding algorithms for LVCSR. Since accuracy, speed, and memory cost are three indispensable and correlated performance measurements for a practical continuous speech recognition system, all three aspects are carefully considered, with the main innovations in fast and memory-efficient decoding algorithms.;For accuracy, crossword triphone based Hidden Markov Model (HMM) is used in the developed system, which has been proved to generate significantly higher accuracy than within-word triphone HMM. With the use of crossword triphone model, the search space dramatically increases compared with the system using within-word triphone model. Crossword Language Model lookahead and fan-out arc tying are used to make the search space as compact as possible. Five heuristic pruning methods and two lookahead techniques are also exploited to reduce the search space with little or no loss of accuracy.;For the aspects of speed and memory cost, a novel algorithm, Order-Preserving Language Model Context Pre-computing (OPCP) is proposed for fast Language Model (LM) lookup, resulting in significant improvement in both overall decoding time and memory space without any decrease of recognition accuracy. OPCP is a novel integration of two previously proposed methods: Minimum Perfect Hashing (MPH) and Language Model Context Pre-computing (LMCP). By reducing hashing operations through order-preserving access of LM scores, OPCP cuts down LM lookup time effectively. In the meantime, OPCP significantly reduces memory cost because of reduced size of hashing keys and the need for only last word index of each N-gram in LM storage. Experimental results are reported on two LVCSR tasks (Wall Street Journal 20K and Switchboard 33K) with three sizes of trigram LMs (small, medium, large). In comparison with MPH and LMCP methods, OPCP reduced LM lookup time from about 30∼80% of total decoding time to about 8%∼14%, without any loss of word accuracy. Except for the small LM, the total memory cost of OPCP for LM lookup and storage was about the same or less than the original N-gram LM storage, and was much less than the compared methods. The time and memory savings in LM lookup by using OPCP became more pronounced with the increase of LM size.;By using the OPCP method and other optimizations mentioned above, our one-pass LVCSR decoding engine, named TigerEngine, reached real-time speed in both tasks of Wall Street Journal 20K and Switchboard 33K, on the platform of a Dell workstation with one 3.2 GHz Xeon CPU.

Keywords/Search Tags:

Continuous speech, Time, Recognition, Large, LM lookup, OPCP, LVCSR, Decoding

Related items

1	Run-time information fusion in large vocabulary continuous speech recognition
2	Application Of Convolutional Neural Network In Large Vocabulary Continuous Speech Recognition
3	Research On Decoding Technology Of Chinese Continuous Speech Recognition
4	Discriminative Training For Large Vocabulary Continuous Speech Recognition
5	The Alorithm Of Embedded Continuous Speech Recognition
6	The Performance Optimization Research On Large Vocabulary Continuous Speech Recognition
7	Research On Large Vocabulary Continuous Speech Recognition Based On Deep Learning
8	Deep Neural Network Based Acoustic Feature Extraction For LVCSR Systems
9	A Study Of An Irrelevant Variability Normalization Based Large Vocabulary Continuous Speech Recognition
10	Establishment Of Mandarin Large Vocabulary Continuous Speech Recognition Based On Hybrid ANN/HMM Models