Font Size: a A A

Experimental Research Of Constructing Chinese And Uighur Bilingual Corpus

Posted on:2008-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:X D ReFull Text:PDF
GTID:2178360215482885Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bilingual corpus plays an important role in the field of Example-based Machine Translation(EBMT),acquirement of translation knowledge,construction of bilingual dictionary and word sense disambiguation etc. Building a large scale of bilingual corpus is the foundation of research on corpus. How to use the existing bilingual text to build the large scale of bilingual corpus made it important to process the bilingual text.Bilingual alignment technology is the key technology during the course of processing bilingual text. This thesis introduces the application of bilingual corpus and alignment technology in Chinese-Uighur Machine Translation,the construction methods of Chinese-Uighur bilingual corpus and discussed with the sentence and paragraph alignments in this corpus.This thesis firstly uses several alignment technology to the same experimental texts,and statistics the effect of each alignment technology, at last summarizes that dictionary based alignment is the most valuable technology for the chinese and uighur sentense alignment. Then considering the defects in the practical application of traditional Paragraph Alignment method based on carriage-return character, this thesis puts forward a segmental alignment algorithm based on the number information in sentence pairs. To combine the merits of two methods above, a multi-level segment alignment method is suggested in this research. The final experiments prove the method is highly efficient and practical.
Keywords/Search Tags:Bilingual Corpus, Machine Translation, Paragraph Alignment, Sentense Alignment
PDF Full Text Request
Related items