Font Size: a A A

The Experimental Study And Realization Of Mongolian-Chinese Alignment Corpora

Posted on:2010-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:G X ZhangFull Text:PDF
GTID:2178360278451316Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In Natural Language Processing, the bilingual alignment corpora becomes more and more important, and has important research and application merit in the machine translating, dictionary compiling, information retrievaling, translation knowledge acquisitioning and term recognizing etc. The research on the bilingual alignment corpora is mainly focused on the construction, alignment and tagging.In the past three decades, numerous parallel corpora of European languages have been built. In contrast to it, few Chinese-English alignment corpora have been built, especially Mongolian and other language alignment corpora. In this paper our research will be concentrated on the part of speech tagging, vocabulary alignment and syntactic analysis of the Mongolian-Chinese alignment corpora. The following works is included:1. Part of Speech Tagging. Mongolian has more accessory components. For examples, (to do) has (active tense), (passive tense), (dynamic), (inter-dynamic), (iso-Dynamic) etc on morphological changes. For tagging the corpus, we defined not only the part of speech tag set, but also the morphological changes.2. Vocabulary Alignment. The aim of vocabulary alignment is the target language which has the highest semantic similarity to the source language. Mongolian vocabulary and Chinese vocabulary have lots of special close relationships. For example, some of the Chinese verb phrases are corresponded to Mongolian verbs ;and some of the Mongolian verb phrases are corresponded to Chinese verbs; Mongolian numeral is corresponded to Chinese quantification in many cases; and also has empty alignment and so on. This paper has analysed semantic relationships between Mongolian sentences and Chinese sentences in details, marked the characteristics of them and realized the multi-function of a alignment information retrieval tool which includes bilingual sentences, vocabulary, syntactic structure, and so on.3. Syntactic Analysis. It analyses the structure of sentences and phrases. The study of auto process in nature language is concentrated on syntactic analysis area. In the paper, we use the top-down parsing methods to analyse the experimental bilingual corpus, and use the generalized table to generate respective syntactic trees of Mongolian sentences and Chinese sentences.This paper finally has set up a Mongolian-Chinese bilingual corpora experimental system which includes tagging information of vocabulary alignment,part of speech,sentence component and syntactic structures. The experimental system also has alignment of information retrieval function and corpora maintenance function. After analysing examples of experiments in typical corpora,it shows that our work has great significance for machine translation and automatic acquisition of translation knowledge.
Keywords/Search Tags:bilingual alignment corpora, vocabulary alignment, part of speech, sentence component, syntactic structure
PDF Full Text Request
Related items