Font Size: a A A

Resolving And Generating Zero Anaphors In Chinese Discourse:A Corpus-Based Centering Approach

Posted on:2007-12-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:N Y XuFull Text:PDF
GTID:1115360212484364Subject:English Language and Literature
Abstract/Summary:PDF Full Text Request
Anaphora is one of the most frequently encountered phenomena in natural languages, and its resolution and generation are important to discourse understanding and production. Zero anaphors are widely used in Chinese discourse, and it is difficult to resolve or generate them due to the fact that they can exist in any argument structure and the intended antecedents may be present in any grammatical slot. Presently, there have been a variety of approaches to Chinese zero anaphora resolution and generation, among which the most prominent are the syntactic approaches (Huang J., 1984, 1989; Xu L. J., 1986), the discoursal-functional approaches (Li and Thompson 1979, 1981; Chen ,1986; Xu J. J., 1990, 2003; Tao 1993, 1997; Cheng, 1990; Lee, 1990, 1995; You ,1998; Xu Y. L., 1995, 2004), the pragmatic approach (Huang Y., 1994), and the cognitive approach (Tomlin and Pu, 1991). These approaches provide us with different ways of resolving and generating Chinese zero anaphors. However, most of these approaches are merely confined to the stage of "interpretation", thus not justifiable as "resolution" and "generation" in its real sense. Moreover, these approaches are more or less impracticable and not explicit enough to be applied to computer processing.This dissertation attempts to develop computational models for resolving and generating zero anaphors in Chinese discourse. Centering Theory (Grosz et al. 1995; Walker et al. 1998, inter alia), originating as a discourse structure model, has served as one of the major anaphora resolution and generation models in computational linguistics. To testify the cross-linguistic applicability of the constraints and rules stipulated in the Centering theory, many researchers applied them to anaphora resolution and generation for various languages. And some researchers applied Centering Theory to zero anaphora resolution and generation (Kameyama, 1985,1986, 1988, 1998; Walker, Iida, and Cote, 1990, 1994; Mitsuko et al. 2001; Turan 1995, 1998; Di Eugenio, 1990; Rambow, 1993; Ryu, 2001; Prasad, 2003; Prince, 1994). However, in China, very few researchers employ Centering Theory as a framework for analysis on anaphora resolution or generation. The only work on Centering Theory we found in Chinese linguistic literature are Miao (2003) and Wang (2004). Miao(2003) presented a brief review on Centering theory, but he didn't do any further study on Chinese discourse. Wang (2004) applied Centering Theory to Chinese zero anaphora resolution, but his approach, which is mainly based on Iida (1998)'s Global Model, left many details unelaborated . That's the reason why we attempt to make a thorough and comprehensive study on the application of Centering Theory to the analysis of Chinese discourse, particularly Chinese zero anaphora resolution and generation.This dissertation takes Centering Theory as a theoretical basis. The first computational model we develop is the model for Chinese zero anaphora resolution which is called a Revised Integrated Cache Model (RICM). It is an improvement on the Integrated Cache Model proposed by Walker (1996). The model is formulated by drawing on the "anti-stack" notion of the Cache (Walker, 1996) and incorporating Cheng (1990)'s and Lee (1990,1995)'s Recovery Principles due to the requirement of lexical semantics which can serve as perfect retrieval cues for activating the tracking of referents.In doing this, we revise the Centering rule 1 and formulate other 6 rules, i.e. the Cf Ranking rule, the Cf Promotion rule I, the Cf Promotion rule II, the Cf Transfer rule, the Cf Deletion rule, and the Cf Displacement rule; and based on these rules, we propose a computational model (RCM) as well as an algorithm (RICM) for resolving Chinese zero anaphors. As compared with the Stack Model, Global Model, and Cache Model, our model has the advantages of resolving cross-utterance zero anaphors without recourse to a separate global list, and solving the problem of lower ranking entities as Cb.To testify the feasibility of our resolution algorithm, we conduct an experiment, the result of which reveals that among the total number of zero anaphors in the data, 95% has been successfully resolved. Thus our proposed algorithm is feasible and efficient in that it has a correction rate of 95%.The second computational model we develop is the model for Chinese zero anaphora generation which is formulated by taking the Centering Transitions as a constraint for the distribution of anaphoric expressions. It is counted as an efficient way of generating anaphoric expressions (Turan 1995; Kim 1999 ;Ryu 2000). TheCentering Transitions are extracted by conducting a corpus study. And the generation algorithm that we finally develop is also testified by real corpus. The result shows that our proposed generation algorithm has a significantly higher correction rate (about 96.75%), which can be considered as an efficient zero generation algorithm for Chinese.Considering the language-specific features of Centering Theory, we set relevant parameters for centering analysis on Chinese discourse. These parameters include utterance specification, discourse segmentation, and forward-looking centers ranking.Utterance is an essential and basic linguistic unit for discourse organization. Based on the previous approaches (Li, 1956; Hu, 1981; Huang & Liao, 1981; Mann and Thompson, 1987; Crystal, 1991; Zhu, 1995; Poesio, 1995; Traum & Heeman, 1996; Bussmann, 1996; Chu, 1998; Kameyama, 1998; Aronoff & Rees-Miller, 2001; 'Song, 2001; Xu, 2003), we develop a working definition of utterance, which is assumed more appropriate for Centering analysis on Chinese discourse in that it is in conformity with the features of Chinese clauses, the orientation for Centering analysis, and the convinience for computer processing.A discourse can be analyzed as a structure of discourse segments (Grosz and Sidner, 1986), however, "segmenting discourse is an active research area in itself, and there are no texts with agreed upon discourse structures" (Di Eugenio, 1998:116). Based on the notion of topic continuity (Cheng, 1990) and the necessity for avoiding the occurrence of Nil and NO Cb, we develop a working definition of discourse segment in Chinese. This definition has the following four advantages: 1) it can avoid the occurrence of an excess of Nil and NO Cb which are produced by the oversegmentation of discourse, thus allowing more transition types to be employed as an assistance in determining the distribution of referring expressions; 2) it can solve the problem of the interaction of centering with global discourse structure and the test of centering on extended discourses; 3) it allows for inferables as potential referents for anaphoric reference by subsequent utterances, which can make the segmentation go on smoothly; 4) it is especially adequate for the centering analysis of Chinese discourse in that in naturally occurring Chinese discourse, cross-sentence reference and cross-paragraph reference are not uncommon, and zero pronouns, pronouns, andfull NPs can in some cases be used interchangeably.The ranking of the forward looking centers list varies from language to language, and the factors that determine the Cf hierarchy have not yet been completely specified in the centering literature. Based on the notions of 'topic' (Chao, 1968) and 'topic-prominence' (Li & Thompson, 1979), as well as Chen (1984)'s Accessibility Hierarchy, we tentatively specifiy the Cf ranking hierarchy for Chinese. To testify the feasibility of this ranking method, we conduct an empirical study, the result of which lends support to our proposal. Besides, we also examine other factors contributing to the salience of entities, such as the occurrence of the existential-presentative construction (EPC) and the involvement of high intentionality and control.To further improve the coverage of our ranking hierarchy, we also examine how to rank complex NPs. Based on Tetreault (2001)'s view as well as the approaches by Walker and Prince (1995), Gordon et al. (1999), and Hobbs (1978), we propose a ranking method for complex NPs in Chinese, which is assumed less radical, hence more reasonable in dealing with the ranking of Cfs for complex NPs in Chinese discourse.Since the computation of transition types is essential to our study, especially the generation of zero anaphors, we explore it in more depth. By integrating Laurel Fais (2004)'s definition with Strube and Hahn (1999)'s approach, we designate 18 transition types. These transitions are assumed more elaborate in classification and more consistent in terms of inference cost, and particularly they could deal with the transition specification for inferable centers. Moreover, these transitions can be used to further improve the efficiency of our proposed zero generation algorithm as well as the efficiency of our proposed zero resolution algorithm.Through the analyses and their implementations, we hope this research can give us more opportunity to understand characteristics of anaphoric phenomena, which is one of the important phenomena that natural languages demonstrate, and can contribute to the computer processing of Chinese natural language.
Keywords/Search Tags:Centering Theory, Chinese zero anaphors, resolution, generation
PDF Full Text Request
Related items