Font Size: a A A

Research And Development Of Continuous Speech Recognition Based On HTK And Microsoft Speech SDK

Posted on:2008-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:X HuangFull Text:PDF
GTID:2178360242478679Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Speech recognition is a fast growing technique these years. Making computers understand human speech and even communicate with human beings are dreams of us. In the near future, this dream may come true. The main purpose of this paper is to discuss continuous speech recognition.At the very beginning, the basic knowledge of speech recognition was introduced. Detailed discussion of the speech signal processing and speech recognition theory were given. Then, the paper launched in two ways.In the way of pattern recognition, speech signal extraction and speech recognition principle were discussed, while the corresponding speech recognition model was built. Firstly, speech signal was preprocessed, the characteristic parameters MFCC was extracted. Then, on the basis of HMM, monophone model, a large-scale vocabulary continuous speech recognition experiment system was built, HTK3.4 as the platform and TIMIT as the corpus. Experiment about Gaussian mixture splitting was finished. The experiment showed that as the mixture number increased from 1 to 128, the recognition accuracy increased from 47.01% to 62.33%.To derive high level of recognition accuracy, even more Gaussians can be used and thus the percentage of the recognition time used in Gaussian evaluations could be higher. This kind of likelihood-based statistical acoustic modeling is so time-consuming that the recognition is very slow. Some LVCSR systems might even decode speech several times slower than real time. Therefore, it is necessary to develop efficient techniques in order to reduce the time consumption of likelihood computation without a significant degradation of recognition accuracy. In this paper, partial distance elimination (PDE) technique, best mixture prediction (BMP) technique and feature component reordering (FCR) technique were introduced. Experiments showed that the combination of these techniques were effective to fast Gaussian likelihood computation. Another aspect of the paper focused on speech recognition software development, a speech recognition system used for statistics of a basketball game was built. How to use the Microsoft Speech SDK as a voice interface was given, and XML was also introduced. Following is an example of getting started with SAPI, a domain specific continuous speech recognition system, which could identify a number of sentences and dozens of words. Then it was used as the voice interface for statistics of a basketball game. Experiments showed that its recognition accuracy was 86%. Finally, ways of noise control and how to improve the speech recognition rate were introduced.
Keywords/Search Tags:continuous speech recognition, fast likelihood computation, Speech API
PDF Full Text Request
Related items