Font Size: a A A

Research On The Representation And Application Of Chinese Context In Chinese-English Machine Translation

Posted on:2003-09-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:H M MaFull Text:PDF
GTID:1118360092998831Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In this thesis, we study the representation and application of Chinese context in Chinese-English machine translation, and use them in an interlingua-based Chinese-English machine translation system ICENT.On the basis of Chinese syntactic analysis and semantic analysis, we construct a model of Chinese context, called Chinese Context Model (CCM). CCM is a knowledge system which consists of three parts: representation, acquirement and management of Chinese context knowledge. Because Chinese context is dynamic, the representation of Chinese context knowledge in CCM is a kind of structural semantic network, called Concept Information Unit Relation Network (CIURN). Moreover, the acquirement and management of Chinese context knowledge are also dynamic process in CCM. CCM has several features: systematic, practical, dynamic, structural and expandable. And the context knowledge of CCM is made up of syntactic information, semantic information and coherent information. So it is useful for Chinese analysis. In CCM, the acquirement of context knowledge is combined with Chinese analysis. It constructs a kind of basic mode of the representation and application of Chinese context.With the context knowledge in CCM, we research on temporal information analysis of Chinese events, subject ellipsis resolution of Chinese sentences and defmiteness judgment of Chinese noun phrases.We construct a Temporal Information Frame of Chinese Event (TIFCE) which is composed of tense and aspect of Chinese event. On the basis of CCM, an approach is proposed to analyze the temporal information of Chinese events. It resolves temporal reference in Chinese and acquires the temporal information of Chinese events. We analyze Chinese time phrases and calculate tenses of them. Then tenses of Chinese events are acquired by the tenses of time phrases. We acquire aspects of Chinese events by matching the Chinese event aspect templates. In order to get information to generate the correct English verb tense, we construct a kind of mapping relation t, to transform the temporal information in TIFCE into English verb tense. We use some simplified news reports to test our approach and get satisfied results.It is very difficult to generate correct English sentences from those Chinese sentences that omit subjects. So we deal with the problem of subject ellipsis in Chinese sentences with the context knowledge in CCM. This process includes ellipsis subject detecting and ellipsis subject recovering. We put forward and implement a method to detect subject ellipsis in the semantic structure of a sentence, and a "candidate-selecting" strategy to recover the elliptical subject by syntactic conditions, semantic conditions and context conditions. The results of experiments are satisfied.We propose a method to judge the defmiteness of Chinese noun phrase in order to get information to generate the definite article. We define some definite reference relations between Chinese noun phrases. Then, with the context knowledge in CCM, a "candidate-comparing" method is discussed to construct the definite reference relationsbetween noun phrases. Chinese noun phrases are definite when those relations have been established. We implement this method and get good results when Chinese noun phrases are specific.With CCM and those methods discussed above, we improve the ICENT system. The original system only treats syntactic analysis and semantic analysis in one sentence. Now, ICENT system can analyze sentences with the context knowledge in CCM. ICENT has three sub-systems to process temporal information, subject ellipsis and the definiteness of Chinese noun phrase. It can deal with the problems of English verb tense generation, subject ellipsis generation and the definite article generation in Chinese-English machine translation. Moreover, the interlingua of ICENT is also improved. It enhances the representation of context information for English generation.In summary, we construct a model of Chinese context CCM, and study temporal information of Chinese events, sub...
Keywords/Search Tags:natural language processing, Chinese-English machine translation, context knowledge, temporal information, ellipsis, reference, interlingua
PDF Full Text Request
Related items