Font Size: a A A

Automatic Analysis Of Text Structure Based On Rhetorical Structure Theory

Posted on:2015-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2298330431476374Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the wide use of Internet and the development of informationtechnology, a large amount of information flooded into the Internet every day.So much information has made us confused, and we do not know whatinformation we need. Because most information on the internet exists in theform of texts, effective treatment of texts is an important precondition forgetting information from the internet.Currently, the main methods of text processing are automaticclassification, text clustering, automatic abstracting, which played a huge rolein text analyzing and people’s information acquiring. However, because mostof these methods are based on vector space model and statistical methods,semantic information in the texts will always be lost to some extent, whichcannot reflect the true meaning of the original texts, and then will affect thetreating accuracy of abstracting, clustering and so on. From the perspective ofthis article’s author, paragraphs, sentence positions in texts, as well as theorders and relations of some specific sentences, are also to be expressed in thisarticle, and play an important role in readers’ understanding of texts.Therefore, to achieve the full semantic understanding of texts, the texts’structures must be analyzed in detail.Rhetorical Structure Theory (RST) describes the structures of discourses,and is widely used in the analyzing of various kinds of texts. This articleresearched on this theory, and applied it in the automatic analyzing of Chinesetext structure. First, from the structural characteristics of Chinese discourses,we analyzed the role of rhetorical structure theory to describe Chinese textstructures, built the conjunction dictionary for rhetorical structure analysis.Then, based on the conjunction dictionary, we designed and implemented ananalysis algorithm used to construct Chinese text rhetoric structural tree,which laid a solid foundation for the following research on automatic abstracting and so on.The research of this topic mainly includes the following contents:Firstly, deeply understand the basic terminologies, assumptions, coreconcepts of Rhetorical Structure Theory, and combined with the structuralcharacteristics of Chinese discourses, analyze the role of Rhetorical StructureTheory in describing Chinese text structures, to provide theoretical support forfurther research.Secondly, through the preliminary analysis of corpus, determine therhetorical relation sets used in this study, pick out high-frequencyconjunctions after counting the frequency of conjunctions, get the specificusage of each conjunction, and build the rhetorical analyzing dictionary. Inthis process, try to consider all possible contributing elements of conjunctionsin connecting discourses to form sentences, and carefully design thedictionary structure. In particular, by adding a column of collocating words inthe dictionary, the analytical accuracy is highly improved. In addition,building the dictionary in the form of xml files can also facilitate academicexchanges.Thirdly, use the established rhetorical analysis dictionary to editrhetorical analysis algorithms, establish rhetoric structural tree for texts fromthe levels of paragraphs and sentences. Considering the connecting role ofpunctuations and conjunctions in texts, with the full use of collocating words,consider the relationship between clauses, make certain rules ondisambiguating rhetorical structure, and at last, build a complete rhetoricstructural tree without ambiguity. In the process of designing the algorithm,we noticed the program multiplexing, because the two levels have someshared algorithms. Meanwhile, we took the scalability of the algorithm intoconsideration, to make it easy to add new modules to the program with only asmall amount of code modifying to facilitate further expansion of the programin future.
Keywords/Search Tags:RST, rhetorical analysis dictionary, rhetoric structural tree, rhetorical analysis algorithm, text structure analysis
PDF Full Text Request
Related items