Font Size: a A A

Study Of Speech Recognition System For Mandarin Digit Based On HMM

Posted on:2007-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z G HouFull Text:PDF
GTID:2178360182488292Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Speech is an important tool for people communicating with machines. The technology of Automatic Speech Recognition(ASR) can let machines understand humanity's language and carry out corresponding operations and it can be widely used in many areas. Though it has been studied for many years, there are still many problems in ASR worthy to be studied.The Acoustic model of the speech-production/ speech-percept and the speech recognition theory are the basics of ASR, so each step of ASR process is analyzed in details. The improved spectrum entropy algorithm is brought forward for the endpoints detection, and the results of experiments show that robustness of the system has been improved while using this method for endpoints detection. The chosen speech feature parameters have great effects on robustness and real-time of the speech recognition system. After introducing short-time feature parameters and spectrograms, three approaches to extracting speech feature parameters such as Linear Predictive Coding(LPC), Linear Predictive Cepstrum Coefficients(LPCC) and Mel-Frequency Cepstrum Coefficients(MFCC), are discussed in details, then their distortion measure are introduced.The DTW theory and HMM theory are discussed. Their applications in recognition are analyzed through the MATLAB programs. The isolated word recognition systems based on the DTW theory for speaker-independent and speaker dependent are discussed while using different feature parameters. On the other side, the small-vocabulary speaker-independent speechrecognition system based on HMM is constructed. Different feature parameters can be chosen in the recongnition system, which has good robustness. The experiments are conducted to recognize the mandarin digitals from 0 to 9 with this system. The results show that 12-dimension LPCC is the most effective feature, but the recognition rate of 26-dimension MFCC is about 10% higher than that of 12-dimension LPCC.
Keywords/Search Tags:Automatic Speech Recognition(ASR), Linear Predictive Cepstrum Coefficients(LPCC), Mel-Frequency Cepstrum Coefficients(MFCC), Dynamic Time Warping(DTW), Hidden Markov Model(HMM)
PDF Full Text Request
Related items