Font Size: a A A

Research On Automatic Acquisition Of Semantic Element

Posted on:2009-07-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:M FangFull Text:PDF
GTID:1118360242484628Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine Translation is always the dream of mankind, and more of a worldwide difficulty. Up to now, there isn't a satisfactory machine translation system being developed yet. Its difficulty mainly lies in the complexity of language knowledge and our limited understanding of the rules of human languages. In the process of exploring machine translation, we have realized high quality translation requires semantic analysis and understanding of sentences. Machine translation based on Semantic Element (SE) analyzes the source language sentences semantically with reference to the Semantic Element Base, and converts the source language to the form of the target language to realize translation. And its core work is the construction of a large scale Semantic Element Base. The Semantic Element Base is a collection of large amount of SEs, which can only be obtained through extraction from natural languages. It is time-consuming and labor-costing to extract SEs manually, which will meet its bottleneck in ensuring the consistency of extraction standard. A good way to solve this is to extract SEs automatically.Key to extract SE automatically from bilingual sentence pairs are the acquisition of SE structure and the matching of two natural language forms (Semantic Element Representation: SER) of a mutual SE. In this paper, a series of researches have been taken to solve these problems and they are mainly carried out in the following three ways:1. Proposed a semi-automatic SE extraction method based on SE tree restructuring: Based on the fact that the distribution of words formulating the Semantic Element Representation in each language share certain statistical rules, that is co-occurrence, and the structure of SE takes on certain regular modes, semantic structure of sentences and the correspondence between SERs of both the source language and of the target language have been acquired. Firstly, it searched the semantic elements that form the sentence meaning or the abandonable semantic element from a preset semantic element base, then it restructured the sentence meaning and inferred new SEs based on the formulation rules of SEs. Newly inferred SEs will be added to the semantic element base after being reviewed manually. It provides a feasible solution for the construction of semantic element base through circulating accumulation to acquire semantic elements. 2. Proposed an automatic SEs extraction method based on transformation and mapping: It firstly used link grammar and transformation for lower level semantic structure analysis of English, and through statistical word alignment to find the correspondence of the source language and the target language. Grounded on the features of SE and its manual extraction experience, a set of rules were developed to process the parsing result of English sentences, so as to set up the semantic layer of a word and transform it from semantic level to its form in English. Then statistical word alignment was used to map the English SER. At last, the automatic extraction results were optimized through constants competition, integration and combination of SERs. It can extract SEs automatically without the support of a ready SE base, with certain error tolerance for word alignment.3. Proposed an automatic SEs extraction method based on statistical prioritization and decomposition: It firstly used analogism to decompose the sentence meaning of bilingual sentence pairs to acquire its semantic structure, maintaining the bilingual correspondence as well. Then it prioritized SE decomposition results through multi-feature decision-making strategy, using three statistical indices like parameter index, language module index and translation module index. The generated abandonable SE can be further decomposed through reiteration method. It is free from the restriction of any grammars but with moderate accuracy.
Keywords/Search Tags:Machine Translation, Semantic Element, Automatic Acquisition
PDF Full Text Request
Related items