Research On Korean Spoken Language Identification

Posted on:2014-07-18

Degree:Master

Type:Thesis

Country:China

Candidate:S D Lu

Full Text:PDF

GTID:2268330401960673

Subject:Computer application technology

Abstract/Summary:

Language identification is a very important research issue in the field of speech recognition. Up to now, most speech recognition systems are trained by a single language, therefore, to unknown languages or multilingual speech, the existing speech recognition strategies will lose effectiveness. With the rapid development of science and technology, the demand of application systems about pluri lingual information service and spoken national language translation systems are more and more urgent, so the research and application of language identification technique become of the utmost importance. For language identification system, there are more difficulties and challenges than single speech recognition, language identification system referred to language characteristic analysis of multilingual space. So the research of Korean language identification under plurilingual frame including Korean, Chinese and English language has the same academic value and practical significance as single speech recognition. A Korean language identification method based on special syllables and prosodic feature of Korean was proposed in this dissertation.First of all, the usual auxiliary words and suffixes of Korean were obtained by analyzing the actual Korean texts based on the Korean grammatical forms. According to the change rules of Korean phonetics, the actual pronounce of the usual auxiliary words and suffixes in Korean flow were also obtained. The first eight syllables of auxiliary words and suffixes of Korean with the high frequency were taken as the special syllables, and after adjusting unified dimensions of MFCC and LPCC by network of adjusting time frames, the artificial neural network which is the special syllable classifier, one of base classifiers, was trained using adjusted features.Then,, five audio features of pitch, intensity, formant, energy, as well as pronunciation rate were extracted and then statistics of the first four features that contain mean, variation range, maximum, minimum and variance were computed. Support vector machine base classifier SVM_FF was trained by the statistics of pitch and formant while base classifier SVM_IER was trained by pronunciation rate and the statistics of intensity and energy.Finally, whether a given audio file is Korean was determined by majority rule with combining syllable classifier, base classifier SVM_FF and base classifier SVM_IER.The experimental results show that the proposed Korean language identification method based on Korean special syllables and prosodic feature has recognition rate of87.25%. It is very effective to distinguish the Korean file from Chinese and English, which proves that the presented method in this dissertation was rationality and validity.

Keywords/Search Tags:

Korean language identification, special syllable, prosodic feature, classifier combination, artificial neural network, support vector machine

Related items

1	Research On Minority Language Recognition WEKA Platform And Multi-classifier
2	Study On Seal Identification
3	Convolutional Neural Network Based On Improved Support Vector Machine Research On Image Recognition Method
4	Research And Implementation In Face Recognition Based On Multiple Classifiers
5	Design And Combination On Classifier
6	Modulation Format Identification Technology And Application Using Support Vector Machine In Elastic Optical Networks
7	Research On Speech Emotion Recognition Based On Multiple Feature Combination
8	Support Vector Machine Based Language Recognition
9	Classification Of Polarimetric SAR Data By The Combination Of Support Vector Machine Classifier And Decision Tree Classifier
10	Research On Peer To Peer Traffic Identification Method Based On Artificial Bee Colony Algorithm And Wavelet Support Vector Machine