Font Size: a A A

A Study On The Alignment Of Chinese And Vietnamese Bilingual Words

Posted on:2016-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y MoFull Text:PDF
GTID:2208330470970596Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, we are faced the challenge of real-time translation and the diversity of massive information, which simply using artificial methods to deal with the translation of these data, already can meet the demand of cross-lingual activities, The understanding of Chinese-Vietnamese bilingual language is the basis for strengthening culture exchange between China and Vietnam, and the construction of Chinese-Vietnamese bilingual corpus is the essential resources to the understanding of Chinese-Vietnamese bilingual language. Word alignment is a core task in NLP. It learns translation equivalences from parallel corpora and serves as the major source of translation knowledge. The results of bilingual word alignment can provide the basis support for speech recognition, bilingual dictionary editor, information retrieval and natural language applications, at the same time for machine translation and bilingual information extraction applications, it.has very important commercial value. The research of Chinese-Vietnamese word alignment methods, constructing a certain scale of Chinese-Vietnamese word aligned corpus has an important supporting role to the understanding of Chinese and Vietnamese. This thesis discussed some bilingual word alignment methods of Chinese and Vietnamese, Mainly completed the following three aspects:(1) We analyze Chinese-Vietnamese bilingual linguistic features and select the feature functions. For the differences between structure and location in bilingual language, starting from a different language structures order of the attribute and central word of the different between Vietnamese and Chinese, then selecting the feature functions of Chinese-Vietnamese. Experimental results suggest, this bilingual word alignment method for Vietnamese and Chinese performs well.(2) We propose a bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint. It is difficult to achieve auto-alignment between Vietnamese and Chinese, because their syntax and structure are quite different. In this case we present a novel method for the Vietnamese-Chinese word alignment which merges a variety of feature constraint models. In this article, an improved model based on the Vietnamese-Chinese progressive structure and offset features of word sequence is described. From this model which is trained by a log-linear model framework, and with parameters trained by the minimum error rate algorithm, the result of the Vietnamese-Chinese auto-alignment is obtained. The basic model of the experiments is IBM Model 3, and as experimental results suggest, this bilingual word alignment method for Vietnamese and Chinese performs well and precision, recall rates are increased by 28.57% and 25.02%, AER is reduced by 14.25%.(3) We put forward a bilingual word alignment method of Chinese-Vietnamese based on Deep Neural Network. In order to learnt to capture syntactic structure, lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences, it can learn suitable features automatically with raw input data, given a training objective. Firstly, we should convert Vietnamese-Chinese bilingual word into word embedding, and as the input within DNN; secondly, on the basis of HMM model, by adapting and extending and integration of context information to build DNN-HMM word alignment model. The basic model of the experiments are HMM and IBM4, and through large-scale Vietnamese-Chinese bilingual word alignment task results suggest, this bilingual word alignment method for Vietnamese and Chinese performs well than basic model.
Keywords/Search Tags:Chinese, Vietnamese, Log-linear model, Word Alignment, Deep Neural Network
PDF Full Text Request
Related items