Research And Development Of Continuous Speech Recognition Based On HTK And Microsoft Speech SDK

Posted on:2008-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:X Huang

Full Text:PDF

GTID:2178360242478679

Subject:Computer software and theory

Abstract/Summary:

Speech recognition is a fast growing technique these years. Making computers understand human speech and even communicate with human beings are dreams of us. In the near future, this dream may come true. The main purpose of this paper is to discuss continuous speech recognition.At the very beginning, the basic knowledge of speech recognition was introduced. Detailed discussion of the speech signal processing and speech recognition theory were given. Then, the paper launched in two ways.In the way of pattern recognition, speech signal extraction and speech recognition principle were discussed, while the corresponding speech recognition model was built. Firstly, speech signal was preprocessed, the characteristic parameters MFCC was extracted. Then, on the basis of HMM, monophone model, a large-scale vocabulary continuous speech recognition experiment system was built, HTK3.4 as the platform and TIMIT as the corpus. Experiment about Gaussian mixture splitting was finished. The experiment showed that as the mixture number increased from 1 to 128, the recognition accuracy increased from 47.01% to 62.33%.To derive high level of recognition accuracy, even more Gaussians can be used and thus the percentage of the recognition time used in Gaussian evaluations could be higher. This kind of likelihood-based statistical acoustic modeling is so time-consuming that the recognition is very slow. Some LVCSR systems might even decode speech several times slower than real time. Therefore, it is necessary to develop efficient techniques in order to reduce the time consumption of likelihood computation without a significant degradation of recognition accuracy. In this paper, partial distance elimination (PDE) technique, best mixture prediction (BMP) technique and feature component reordering (FCR) technique were introduced. Experiments showed that the combination of these techniques were effective to fast Gaussian likelihood computation. Another aspect of the paper focused on speech recognition software development, a speech recognition system used for statistics of a basketball game was built. How to use the Microsoft Speech SDK as a voice interface was given, and XML was also introduced. Following is an example of getting started with SAPI, a domain specific continuous speech recognition system, which could identify a number of sentences and dozens of words. Then it was used as the voice interface for statistics of a basketball game. Experiments showed that its recognition accuracy was 86%. Finally, ways of noise control and how to improve the speech recognition rate were introduced.

Keywords/Search Tags:

continuous speech recognition, fast likelihood computation, Speech API

Related items

1	The Performance Optimization Research On Large Vocabulary Continuous Speech Recognition
2	Chinese Speech Recognition Technology And Its Application In Speech Separation
3	The Research And Implementation Of Continuous Speech Recognition System Based On Word Network Model
4	Research On Tibetan Non-specific Continuous Speech Recognition Based On Deep Learning
5	Research On Continuous Speech Command Recognition Technology Based On Aerocraft
6	The Alorithm Of Embedded Continuous Speech Recognition
7	Technology Of Tibetan Speech Recognition Based On Fast Walsh Transform
8	Research Of Speech Recognition And Its Application In The Speech Error Identifying System
9	Application And Research On Speech Recognition Technologies In Security Monitoring System
10	Research On Continuous Speech Recognition Technology In Noisy Environment