Font Size: a A A

Research On Korean Spoken Language Identification

Posted on:2014-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:S D LuFull Text:PDF
GTID:2268330401960673Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Language identification is a very important research issue in the field of speech recognition. Up to now, most speech recognition systems are trained by a single language, therefore, to unknown languages or multilingual speech, the existing speech recognition strategies will lose effectiveness. With the rapid development of science and technology, the demand of application systems about pluri lingual information service and spoken national language translation systems are more and more urgent, so the research and application of language identification technique become of the utmost importance. For language identification system, there are more difficulties and challenges than single speech recognition, language identification system referred to language characteristic analysis of multilingual space. So the research of Korean language identification under plurilingual frame including Korean, Chinese and English language has the same academic value and practical significance as single speech recognition. A Korean language identification method based on special syllables and prosodic feature of Korean was proposed in this dissertation.First of all, the usual auxiliary words and suffixes of Korean were obtained by analyzing the actual Korean texts based on the Korean grammatical forms. According to the change rules of Korean phonetics, the actual pronounce of the usual auxiliary words and suffixes in Korean flow were also obtained. The first eight syllables of auxiliary words and suffixes of Korean with the high frequency were taken as the special syllables, and after adjusting unified dimensions of MFCC and LPCC by network of adjusting time frames, the artificial neural network which is the special syllable classifier, one of base classifiers, was trained using adjusted features.Then,, five audio features of pitch, intensity, formant, energy, as well as pronunciation rate were extracted and then statistics of the first four features that contain mean, variation range, maximum, minimum and variance were computed. Support vector machine base classifier SVM_FF was trained by the statistics of pitch and formant while base classifier SVM_IER was trained by pronunciation rate and the statistics of intensity and energy.Finally, whether a given audio file is Korean was determined by majority rule with combining syllable classifier, base classifier SVM_FF and base classifier SVM_IER.The experimental results show that the proposed Korean language identification method based on Korean special syllables and prosodic feature has recognition rate of87.25%. It is very effective to distinguish the Korean file from Chinese and English, which proves that the presented method in this dissertation was rationality and validity.
Keywords/Search Tags:Korean language identification, special syllable, prosodic feature, classifier combination, artificial neural network, support vector machine
PDF Full Text Request
Related items