Font Size: a A A

Research On Lexical Chains And PageRank Based Multi-document Summarization

Posted on:2009-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:X Y XiaoFull Text:PDF
GTID:2178360272490076Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the explosion of information in internet, the redundancy of information has become a serious problem. There are often thousands of documents related to one topic. Most of their contents are quite similar, but also different in their emphasis. It is necessary to develop a tool that can provide the general and most important information in multi-document with brief and coherent language, thus, multi-document summarization is put forward for settling this problem. Multi-document summarization can generate a brief, fluent summary for multi-document, and finally release people from trivial and redundant information. Multi-document summarization is the inevitability of the developing of information age. With good theory value and prospect of application, multi-document summarization has become research spot in the domain of text processing.In this paper, we first briefly introduce the classes and development history of automatic summarization, and describe the traditional method of single document summarization and multi-document summarization. Then we discuss the future direction of research and development of automatic summarization.Secondly, we describe the concepts and traditional construction algorithms of lexical chains in detail. By analyzing advantages and disadvantages of traditional methods, we propose a new method called two phases lexical chains construct algorithm. Experimental results show that this method can improve the accuracy and has good efficiency.Then, we introduce graph-based ranking method and PageRank algorithm, and discuss the key problems in using graph-based ranking in text processing. Furthermore, we propose a PageRank-based sentence extraction method.Finally, we introduce lexical chains and PageRank based multi-document summarization system. This system used lexical chains to analyze the subtopic structure of the documents, sort the subtopics, and use PageRank algorithm to extract sentences in each subtopics to produce final Summary. Summary generated by this method can reflect all the important subtopics well, and also has a lower redundancy. Experimental results show that this integrated approach generate summary with high quality.
Keywords/Search Tags:Multi-document summarization, Lexical chains, PageRank
PDF Full Text Request
Related items