Font Size: a A A

The Study Of The Alignment Method In The Chinese-English Parallel Corpora

Posted on:2005-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:D M LiuFull Text:PDF
GTID:2168360122988668Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In Natrual Language Processing, the bilingual parallel corpora become more and more important. The research on the bilingual parallel corpora is focused on the construction, alignment and tagging. Its have the important research merit in the machine translating, dictionary compiling, multilingual information retrieving, and term recognizing etc.In the past three decades, numerous parallel corpora of European languages have been built. In contrast to it, few Chinese-English parallel corpora have been built. In this paper our research will be concentrate on the alignment and word sense disambiguation in the Chinese-English parallel corpora.The following works is included:1.Content Word Alignment. On the base of reviewing kinds of statistical parameter, we developed a hybrid statistical technique adapted to Chinese-English parallel corpora for the high-frequence content words, and adequately used dictionary information for the low-frequence content word. At last, we used the competitive linking algorithm and achieved the better result.2.Chinese-English Bilingual Chunk. We identified the chunk in the both language and aligned them at the same time after having acquired the alignment content-word in bilingual corpora.This method can avoid the disagreement of the bilingual chunk boundary.3.Noun Phrase Correspondences. According to the statistical characteristic of noun phrases, we used an iterative re-evaluation algorithm for High-frequency noun phrases, and our metdhod for low-frequence noun phrases is similar to the algorithm for low-frequence content word.This method can take into account the alignment information on the whole, and acquire the result with high coverage rate.4.Word Sense Disambiguation. Now, the most word sense disambiguation algorithm on the parallel corpora is limited in the context of the single ambiguous word and its alignment information.In this paper, we made full use of computability of the concept in the Hownet and changed word sense disambiguation problem into the similarity calculation problem between the ambiguous word and the whole sentence of the other language.In this way, we disambiguated the ambiguous word from a new point of view and achieved the satisfactory result.This paper complished the alignment of all kinds of information units in the Chinese-English parallel corpora by making full use of the characteristic of Chinese and English. The experiment result is remarkable.
Keywords/Search Tags:Bilingual Parallel Corpus, Word Alignment, Noun Phrase Correspondences, Bilingual Chunk, Word Sense Disambiguation
PDF Full Text Request
Related items