Font Size: a A A

Research On The Using Of Tone Information In Mandarin Automatic Speech Recognition System

Posted on:2008-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:S QiangFull Text:PDF
GTID:2178360212984920Subject:Computer applications
Abstract/Summary:PDF Full Text Request
It is commonly believed that the tone information plays an important role in spoken tonal languages like Mandarin Chinese due to the lexical nature of tone. Succinct tone modeling is critical for high performance of the state-of-art speech recognition system. However, the Pitch, which is used to present Tone, can not be modeled by ordinary HMM according to the particularity of it. The properties of tone, which are listed as follows, traditionally make tone modeling difficult.a) Tone is carried by perceivable pitch (F0 as the value presentation) in the voiced part of a syllable. However, no pitch is perceived in the unvoiced region. As a result, the continuous HMM can not be directly applied to tone modeling since the whole F0 trajectory is discontinuous at the junctures of neighboring voiced and unvoiced segments.b) Tone is a supra-segmental feature which can span over multiple voiced segments. A time window longer than the window size used for extracting spectral features should be used for extracting tonal features.In this dissertation, we raised two solutions according to these two challenges respectively.a) We applied MSD (Multi space distribution) to tone modeling. This method provides an almost perfect solution to model F0 without any heuristic assumptions which is used in the traditional F0 interpolation method. MSD models the two probability spaces, discrete for unvoiced region and continuous for voiced pitch contour, in a linearly weighted mixture, to tone modeling.b) We propose a two-pass search strategy for improving tonal syllable recognition performance. In the first pass search, a tonal syllable lattice is generated with embedded tone information using MSD-HMM. In the second-pass search, the outline F0 features are extracted first with the syllable boundaries given by the first-pass search and modeled by explicit tone models. Scores computed from the trained explicit tone models are combined with the scores of tonal syllable obtained in the first-pass search. The combined score is used to find a best path in the lattice.For comparison, we implemented two baseline systems, using MFCC feature only and using the traditional pitch interpolation proposed by IBM. The results show that the MSD method outperforms these two baseline methods a lot. Whereas with the two pass rescoring method, experimental results on the same Mandarin database show that an additional 8.7% relative error reduction of tonal syllable recognition is obtained by the second pass with enhanced tone models.
Keywords/Search Tags:Speech Recognition, Mandarin Chinese, Tone Modeling, MSD, Two-Pass Decoding
PDF Full Text Request
Related items