Font Size: a A A

Chinese-english Bilingual Corpus Automatically Aligned

Posted on:2000-09-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:B WangFull Text:PDF
GTID:1118360185495558Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Natural Language Processing (NLP) is a kind of science which deals with the morphology, pronunciation and sense of natural languages. Nowadays, along with the more and more widespread computers, our society becomes an information society and the communication between human beings becomes more and more frequent, all the society cries for the NLP technology.NLP includes two basic research methods: one is rationalism, the other is empiricism. They also can be called rule-based method and corpus-based method respectively when they are used in practice. Because each one's strong points can offset the other one's weakness, the two basic methods are usually combined in the present NLP researches. Formally, most present researches lay stress on the language knowledge extraction from large-scale corpus and then apply the knowledge rules to the NLP process.A corpus can be a monolingual corpus or a multilingual corpus according to the number of language it contains. The former only contains texts of one language while the latter contains more. One of the typical multilingual corpora is a bilingual corpus which contains mutual translation texts of two different kinds of languages. Because it contains translation information between two kinds of languages, the bilingual corpus can provide very valuable information for bilingual researches such as machine translation and bilingual lexicography. Thus, creating bilingual corpora becomes one of the most important topics in the NLP filed at present.The key technology to create bilingual corpora is alignment. The text alignment problem may be stated succinctly as follows: given two texts that are mutual translations, automatically calculate the correspondences between their respective segments. Concretely, this means identifying for each segment in one text the segment in the other text that is its translation. The nature of the segments determines the resolution of the alignment: sections, paragraphs, sentences, words, bytes, etc. Different NLP applications need different bilingual corpora aligned at different level.Chinese and English are two of the most typical languages in the world...
Keywords/Search Tags:Natural Language Processing, Corpus, Bilingual Corpora, Alignment
PDF Full Text Request
Related items