Font Size: a A A

The Research Of Sentence Alignment In Chinese-Uighur Bilingual Corpus

Posted on:2007-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:X H BiFull Text:PDF
GTID:2178360185466263Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer and the Internet, the application of bilingual (multilingual) parallel corpus has become an important issue in the field of Natural Language Processing. Moreover, the parallel corpus is valuable in machine translation, bilingual dictionary compilation, word sense disambiguation and Cross-Lingual Information Retrieval.In creation of the parallel corpus, research of alignment at different levels is an essential topic. In order to extract linguistic knowledge from parallel corpus, it is necessary to align them first. The alignment is also a necessary phrase in the construction of the Example-Based Machine Translation (EBMT).This thesis firstly introduces the application of bilingual corpus and alignment in Chinese-Uighur Machine-Aided Translation. The construction of Chinese-Uighur bilingual corpus is discussed with the sentence and the paragraph alignments in this corpus. According to statistic analysis, it is found that there are comparatively steady text-length relations between Chinese-Uighur parallel bilingual texts. Then the sentences alignment is adopted, which is based on its length. Dynamic programming is employed in this thesis. Considering the defects in the practical application of traditional Paragraph Alignment method based on carriage-return character, this thesis puts forward a segmental alignment algorithm based on the anchor sentence pairs. To combine the merits of two methods above, a multi-level segment alignment method is suggested in this research. The final experiments prove the method is highly efficient and practical.
Keywords/Search Tags:Machine Translation, Bilingual Corpus, Paragraph Alignment, Sentence Alignment
PDF Full Text Request
Related items