Based On The Characteristics Of Cv Syllable Minority Language Recognition Research

Posted on:2013-05-21

Degree:Master

Type:Thesis

Country:China

Candidate:F L Kou

Full Text:PDF

GTID:2248330374959626

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Automatic language identification (LID) is the task of identifying the language from an utterance of the speech signal. With the development of the world communication, automatic language identification plays an important role in information services, military security and other domains. Automatic language identification is a challenging problem although it has been studied for almost40years.Several feature information are used when humans identify the language, the identification rate and computation of the language identification system rely on feature extraction, so feature extraction is the key of LID. The most LID system is based on segmental features and phone recognition, the former is time consuming and with large amount of computation, the disadvantage of the latter is:building segmented and labeled speech corpora require trained human annotators, and with weak transportability.Syllable is the most natural unit of speech, and it has better representational and durational stability. This paper deals with the method of LID depending on pseudo-syllable automatic extraction. The main work includes:(1)In order to automatic extract pseudo-syllable from speech, we introduce the notion of CV-syllable, which is consists of a consonant segment and a vowel segment. Then we devise an algorithm of automatic CV-syllable extraction. After that for every CV-syllable, we extract the duration of each segment, the average Mel frequency cepstra coefficient(MFCC), the variance of MFCC, and the cepstra distance between them, this parameters constitute the CV-syllable feature vector.(2) In the system realization, Gaussian mixture models (GMM) and language model (LM) are used to describe the languages respectively.(3)Experiments are performed on Mandarin and six minority languages, CV-syllable features are extracted and modeled by GMM and LM respectively, and then we test the LID system. Experiments proved that the approach can effectively identify languages, the LID system based on GMM reached74.3%of mean correct identification rate. The LID system based on LM make progress with the mean identification rate76.0%of correct identification. In the usual LID system based on the segmental features, using pseudo-syllable can improve identification rate.

Keywords/Search Tags:

Language identification, CV-syllable, Features extraction, GaussianMixture Model, Language Model

PDF Full Text Request

Related items

1	Research On Language Identification Based On Acoustic And Phonology
2	Identification, Based On The Language Of The Gmm-ubm Model
3	Research On Korean Spoken Language Identification
4	The Research And Implementation Of Deep Learning Based Spoken Language Identification
5	Research On Automatic Language Identification Technology Over Telephone Channel
6	Research On Event Extraction Technique Based On Pre-trained Language Model
7	The Design And Realization Of Programming Model Language
8	Research On Keyword Extraction Method Based On Semantics Features
9	Research On Computer Virus Signature Automatic Extraction Technique
10	The Optimization And Implementation Of The Efficiency And Performance Of Chinese Language Model Based On Recurrent Neural Network