Font Size: a A A

Research On The Application Of Tone Information To Mandarin Speech Recognition

Posted on:2011-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2178360308955290Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the refinement of statistical pattern recognition theories and the improve-ment of computing power, ASR (Automatic Speech Recognition) technology has achieved remarkable progress in recent years. As one kind of ASR technology, Man-darin Speech Recognition technology not only has general problems as other language speech recognition, but also has its own characteristics. Tone information is one of those. Compared with other non-tonal language, tone plays an important role in dis-ambiguating confusable words. So it plays a key role in improving the recognition rate in ASR system. However, the Pitch which is used to represent the tone has some particularities like discontinuity, supra-segmental feature and so on. How to effec-tively use tone information becomes a hot research focus. Based on the single stream HMM (Hidden Markov Model), we propose a kind of double stream HMM. In a se-ries of experiments, the results demonstrate that the performance of double stream HMM has outperformed traditional single stream HMM and MSD-HMM (Mul-ti-Space Probability Distribution Hidden Markov Model).Afterwards, we introduce the idea of double stream modeling into discriminative training of acoustical model and proposed the synchronous double stream discriminative training. As result of above efforts, the tone information has got utilized more effectively and largely im-proves the performance of mandarin speech recognition. The whole thesis is orga-nized as follows:Chapter 1 is the introduction. In the beginning, it briefly describes the back-ground and development of ASR. After that, it expounds the principle and the frame-work of ASR system. Finally, it introduces the conception, characteristics and diffi-culties of Mandarin speech recognition.Chapter 2 introduces the ASR system based on the HMM framework and sepa-rately specify the mathematical definition of HMM, three basic problems of HMM and the ASR system built on HTK(Hidden Markov Model Toolkit).Chapter 3 describes the Mandarin tone information and the pitch feature which is used to represent the tone and illustrate two algorithms to extract the feature: the one is SHS(Sub-Harmonic Summation),the other is ETSI pitch extraction algorithm.Chapter 4 gives the introduction of how to modeling with tonal feature and acoustical feature. The first method is the traditional single stream HMM; the second is the double stream HMM and the last is the MSD-HMM which means to solve the problem of discontinuity of pitch. In this chapter, we give the comparison of the theory, advantages and disadvantages of three methods. In the experiments, the results indicate that the performance of double stream HMM is better than other two me-thods.Chapter 5 discusses in detail about the rules and parameter update algorithm of single stream discriminative training and gives the rigorous derivation and proof of the parameter update algorithm of synchronous double stream discriminative training. In the subsequent experiments, the results show that synchronous double stream dis-criminative training has stronger robustness and excellent performance in various tasks set compared with single stream discriminative training.Chapter 6 concludes the thesis. The possible improvements are given and the fu-ture development of ASR is discussed.
Keywords/Search Tags:Single stream HMM, Multi-space probability distribution HMM, Double stream HMM, Tone information, Single stream discriminative training, Synchronous double stream discriminative training
PDF Full Text Request
Related items