Font Size: a A A

The Performance Optimization Research On Large Vocabulary Continuous Speech Recognition

Posted on:2010-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:J L OuFull Text:PDF
GTID:2178360275994203Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Large vocabulary continuous speech recognition (LVCSR) is one of the most important subjects of spoken language processing, which involves many knowledge sources and techniques such as acoustic model, language model and decoding algorithm. This paper will introduce the basic knowledge of speech recognition and then discuss how to improve the real-time performance of speech recognition systems and how to improve the recognition accuracy.Most LVCSR systems are based on statistical models, which use continuous density HMM as the underlying technology to perform acoustic modeling of speech signals. In this system, each state is a Gaussian mixture model (GMM) which is consisted of many Gaussian mixtures. For this kind of likelihood-based speech recognition systems, the state likelihoods estimation is computationally intensive. It is one of the most important reasons why the recognition is so slow. Therefore it is necessary to develop efficient techniques in order to reduce the computational overhead of likelihood computation without any degradation or a significant degradation of recognition accuracy. The likelihood computation of LVCSR systems which are based on continuous density HMM is analyzed to show that the conventional way of sequential computing is time-consuming and the likelihood computation itself can be implemented in parallel. A SIMD-based algorithm which can carry out parallel likelihood computation is presented in this paper. By taking HTK 3.4 toolkit as the baseline system and TIMIT,WSJ0 corpus as the experiment corpus, the experiment platform is built. And then the algorithm is compared to other efficient techniques such as partial distance elimination (PDE), best mixture prediction (BMP), and feature component reordering (FCR) and Gaussian selection (GS) on this platform. Experiments results show that the SIMD-based algorithm can significantly reduce the time overhead of likelihood computation without any degradation of recognition accuracy. And the performance is better than other fast computation techniques. In order to integrate the semantic knowledge with N-gram language model for LVCSR to improve recognition accuracy, the theory of latent semantic analysis (LSA) and the related techniques for applying it in LVCSR system is described in this paper. And then LSA model is constructed on the WSJO text corpus. We use the interpolation method to combine this model with conventional N-gram to form a hybrid language model which include semantic knowledge. To optimize the performance of the hybrid model, we apply k-means algorithm to perform vector clustering in the LSA vector space while the density function is used to initialize the centroids, and propose a computation method for smoothing the probabilities. The model perplexity tests and continuous speech recognition experiments are conducted on the WSJO corpus. Results show that the constructed hybrid language model outperforms the corresponding N-gram and can improve the recognition of LVCSR to some extent.
Keywords/Search Tags:Fast Likelihood Computation, Latent Semantic Analysis, Large Vocabulary Continuous Speech Recognition
PDF Full Text Request
Related items