Font Size: a A A

Research And Realization Of Multi - Data Source Automatic Acquisition Of Chinese And Vietnamese Bilingual Parallel Sentences

Posted on:2015-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:S M X RuanFull Text:PDF
GTID:2208330431478083Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently, statistical bilingual word alignment methods show great power in the field of Machine Translation (MT). A variety of word alignment method have been studied, such as Log-linear Models for Word Alignment, Statistical Machine Translation (SMT) Models, Linguistics analysis method,...Different types of methods in different areas and aspects of different characteristics, showing a good sentence alignment performance. Thesis based on multiple data sources to automatically obtain research and implementation of parallel sentences for Chinese and Vietnamese bilingual approach and implementation for sentence alignment technical issues raised distinctive solutions put forward an effective bilingual Chinese and Vietnamese sentences Alignment Method:Firstly constructed sentence alignment for implementing the Han Chinese and Vietnamese-stored in the system to build more parallel corpus, the corpus of parallel corpus pretreatment based on the combined sentence alignment optimization algorithm, Chinese and Vietnamese bilingual sentence alignment results the sentence alignment database. Second, the proposed use more Chinese sentence alignment method for a variety of Chinese and Vietnamese bilingual data sources (online news data, Lu Xun’s works bilingual Chinese and Vietnamese, Chinese and Vietnamese Bilingual Journey to work everyday bilingual Chinese and Vietnamese literature) for processing. Automatically obtain the results of Chinese and Vietnamese bilingual aligned parallel to the sentence, and the alignment analysis of the results of the experiment. Comparison of multiple data sources through the analysis of experimental results, obtained using the alignment system model established in this paper, it can be concluded relatively high alignment accuracy in the alignment of text data network news in the sentence. And a simple analysis of the data sources other reasons sentence alignment lower rates.
Keywords/Search Tags:bilingual word alignment, bilingual corpus, multiple data sources
PDF Full Text Request
Related items