Font Size: a A A

Research Of Key Issues In English Discourse Structure Analysis

Posted on:2014-09-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:F XuFull Text:PDF
GTID:1268330431973250Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent three years, Discourse Structure Analysis (DSA) has been paied muchattention in the computational linguistic area (according to statistics, ACL, COLING andEMNLP publish at least8papers from more than30submissions related to DSA field eachyear). DSA has been regarded as the next hot topic after the traditional informationextraction/information retrieval, machine translation and syntactic/semantic analysis.DSA aims to investigate the internal structure of natural language text and tounderstand the semantic relationship between the text units which can be a word, a phrase,a clause, a sentence or even a paragraph, and it needs to analyse the whole structure of textunits. Therefore, DSA can further extract the rich structural information within texts, andplays an important role in both Natural Language Processing (NLP) and Natural LanguageGeneration (NLG). Generally speaking, about the DSA research, the mainstream methodpaied much attention to the lexical information in discourse such as token, morphology oftoken or token pairs in discourse. However, attitude of a sentence, the cohesion mechanismamong sentences in a discourse are often ignored. Therefore, the performance of thecurrent DSA is not efficient.Against above background, this paper focuses on the following three key problems inDSA mentioned in the compunational linguistics area. To be more specific,1. The research on Implicit Discourse Relation Recognition (IDRR). We present anattitude prosody theory-based IDRR model on the basis of the research of word pairs-based,language model-based and tree kernel-based IDRR models. Our model recognizes implicitdiscourse relation via calculating sentence-level attitude/sentiment information, in themeanwhile, also integrates a depencency word pair tree structure via a composite kernelways. Evaluation on the Penn Discourse Treebank (PDTB)2.0shows the importance of the attitude prosody theory-based IDRR model. It also shows that our model significantlyoutperforms other ones currently in the research field, e.g. word pairs-based, languagemodel-based and tree kernel-based models.2. The research on Discourse Argument Identification (DAI). This paper deals withDAI from both intra-sentences where connective and argument are located in a sentenceand inter-sentence where connective and argument are located in different sentencesperspectives. For the intra-sentences cases, we present a shallow semantic parsingframework-based model on the basis of the research of chunking-based,classification-based and syntactic tree subtraction-based models. Our model recasts thediscourse conjunction as the predicate and its scope into several constituents as the part ofthe predicate. Different from state-of-the-art chunking approaches, our parsing approachextends DAI from the chunking level to the parse tree level, where rich syntacticinformation is available, and focuses on determining whether a constituent, rather than atoken, is an argument or not. For inter-sentence cases, we present a lightweight heuristicrule-based solution which takes the word sequence between the connective and the end ofcurrent sentence and the direct previous sentence before the connective are two discoursearguments of the connective. Evaluation on PDTB shows that the effectiveness of ourshallow semantic parsing framework-based model. It also shows that our modelsignificantly outperforms chunking-based model currently in the research field.3.The research on Discourse Coherence Modeling (DCM). We present a theme-rhemestructure cohesion theory-based discourse coherence model on the basis of the research ofentity-based and discourse relation-based models. Our model describes discoursecoherence via calculating the similarity between theme or rheme of a sentence, in themeanwhile,also integrates two coherence filtering mechanisms based on theme structureand coreference using rule method. Evaluation on five different benchmark data setsreveals the effectiveness of our cohesion theory-driven discourse coherence model. It alsoshows that our system significantly outperforms other ones currently in the research field,e.g. supervised entity-based and discourse relation-based models.We integrate the above three key problems into our tree kernel-based discourse parsing platform based on above research. In order to verify the practical function of thesemethods in NLP applications, we investigate the applications of DSA using the studentessay readability assessment task, and take the linear combination of discourse relationshipvalue and discourse coherence value as readability value. We train the linear parameters onthe open dataset and test them on the actual dataset. Evaluation on the actual dataset showsthe influence of our discourse structure analysis platform in the student essay readabilityassessment. It also shows that our model significantly outperforms other ones currently inthe research field, e.g. supervised entity-based and discourse relation-based models. It cannot only significantly improve the system performance, but also alleviates its dependenceon large-scale annotated corpora.The major innovations of this dissertation include: for the IDRR, we present anattitude prosody theory-based IDRR model to recognize implicit discourse relation viacalculating sentence-level attitude/sentiment information, in the meanwhile, also integratea depencency word pair tree structure via a composite kernel ways. Evaluation on both theopen and closed corpus shows the performance improvement about6%compared with thestate-of-the-art approaches; for the DAI, we present a DAI model based on shallowsemantic parsing framework. Our parsing approach extends DAI from the chunking levelto the parse tree level, where rich syntactic information is available, and focuses ondetermining whether a constituent, rather than a token, is an argument or not. Evaluationon the benchmark PDTB corpus shows the performance improvement about2%and60%using golden and automatic parser trees respectively compared with the state-of-the-artapproaches; for the DCM, we present a theme-rheme structure cohesion theory-basedDCM model to describe discourse coherence via calculating the similarity between themeor rheme of a sentence, in the meanwhile, also integrate two coherence filteringmechanisms based on theme structure and coreference using rule method. Evaluation onthe benchmark Accident, Earthquake, Wall street journal and Britannical elementary corpusshows the performance improvement about3%-6%compared with the state-of-the-artapproaches.The major contributions of this paper lie in presenting some solutions and designing corresponding algorithms to the key technologies of DSA. Experiments show that theabove research not only significantly improves the performance of discourse structureanalysis, but also alleviates its dependence on large-scale annotated corpora. The proposedapproach laies a foundation and exhibits greate reference value to the future research in thediscourse structure analysis area.
Keywords/Search Tags:Natural Language Processing, Discourse Structure Analysis, Attitude ProsodyTheory, Shallow Semantic Parsing, Theme-rheme Structure Theory
PDF Full Text Request
Related items