Font Size: a A A

Research On Word Alignment Method Based On Chinese And Vietnamese Bilingual Parallel Corpus

Posted on:2018-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y T NiuFull Text:PDF
GTID:2358330518961965Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,Machine Translation has gradually become an important means to overcome language barriers when the people communicate each other.Bilingual word alignment research is automatically acquire knowledge of basic aspects of translation,especially in the field of Machine Translation,after the word alignment corpus is very valuable source of translation knowledge.Bilingual word alignment corpus can provide more important support for the Chinese and Vietnamese dictionary compilation,Machine Translation,speech recognition,information retrieval,semantic disambiguation and elimination of Natural Language Processing field of bilingual sentence alignment system and so on,the importance of which makes people become more aware of bilingual word alignment for the corpus.It is of great academic value to study how to improve the quality of Chinese Vietnamese bilingual word alignment on the basis of previous studies,and to construct a large-scale Chinese Vietnamese bilingual word alignment corpus.In recent years,it has achieved good results in the alignment of Chinese and English languages,such as Chinese and English,French and English.However,the study of word alignment between Chinese and Vietnamese is rare.In this paper,in-depth exploration of the causes of Chinese-Vietnamese bilingual word alignment quality and analysis of the existing problems in the process of alignment,and combines with the linguistic features of Vietnamese and the basis of the existing research work,the following research work characteristics is mainly completed:(1)we proposed a method for Chinese Vietnamese bilingual word alignment based on chunk.In order to improve the accuracy of the word alignment of Chinese-Vietnamese and to alleviate the asymmetric problems in the process of Chinese-Vietnamese bilingual alignment,so construct a certain scale of the Chinese and Vietnamese bilingual chunk alignment corpus.Based on chunking corpus alignment and combination of language features of the Vietnamese language,We realize the word alignment using CRFs model(2)The Chinese Vietnamese bilingual word alignment algorithm is proposed which combined with semantic information.Due to the high error rate of low frequency word alignment in alignment,a similarity model is proposed.In a monolingual corpus by using the neural network model trained word similarity model,using similarity model to expand the IBM word alignment model,Finally,the GIZA++ of the fusion word similarity model is used to realize the alignment between Chinese and Vietnamese vocabulary(3)Based on the idea of ensemble learning,this paper proposes a combination of semantic information,word2vec word alignment model and the word alignment model based on chunks,which are regarded as independent alignment classifiers,using simple voting and weighted voting strategies to carry out multiple word alignment models fusion to further improve the quality of word alignment,so it can achieve the evaluation of three different word alignment methods.
Keywords/Search Tags:Chinese, Vietnamese, word alignment, lexical similarity model, ensemble learning
PDF Full Text Request
Related items