Font Size: a A A

Research On Graph Based Models For Multi-document Summarization

Posted on:2010-08-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:F R WeiFull Text:PDF
GTID:1118330332485664Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Sentence ranking is the issue of most concern in document summarization. In recent years, the newly emerging graph-based models and ranking algorithms have drawn considerable attention from the extractive document summarization community. While traditional feature-based approaches evaluate sentence significance and rank sentences relying on the features that are particularly designed to characterize the different aspects of the individual sentences, the graph-based ranking algorithms (such as the PageRank-like algorithms) recursively compute sentence significance using the global information in a text graph that linking sentences together.In general, the existing PageRank-like algorithms can well model the phenomena that a sentence is important if it is linked by other important sentences. Or to say, they are capable of modeling the mutual reinforcement among the sentences in the text graph. However, when dealing with multi-document summarization, these algorithms often assemble a set of documents into one large file. The document dimension is ignored. In this paper, we are focusing on two research items. The first one is how to add the document dimension in existing graph-based models and algorithms, and the second is how to model the query dimension in the text-text similarity computation because we are mostly working on query-oriented multi-document summarization.Most existing approaches take into account sentence-level relations (e.g. sentence similarity) but neglect the difference among documents and the influence of documents on sentences. In this paper, we present a novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation. By exploiting document-document and document-sentence relations, we distinguish intra-document sentence relations from inter-document sentence relations. In such a way, we move towards the goal of truly summarizing multiple documents rather than a single combined document. Based on this model, we develop an iterative sentence ranking algorithm, namely DsR (Document-Sensitive Ranking). Automatic ROUGE evaluations on the DUC data sets show that DsR outperforms previous graph-based models in both generic and query-oriented summarization tasks.Early researchers have presented the mutual reinforcement principle (MR) between sentence and term for simultaneous key phrase and salient sentence extraction in generic single-document summarization. In this work, we extend the MR to the mutual reinforcement chain (MRC) of three different text granularities, i.e., document, sentence and terms. The aim is to provide a general reinforcement framework and a formal mathematical modeling for the MRC. Going one step further, we incorporate the query influence into the MRC to cope with the need for query-oriented multi-document summarization. While the previous summarization approaches often calculate the similarity regardless of the query, we develop a query-sensitive similarity to measure the affinity between the pair of texts. When evaluated on the DUC dataset, the experimental results suggest that the proposed query-sensitive MRC (Qs-MRC) is a promising approach for summarization. First, we extend the mutual reinforcement principle between two objects to the mutual reinforcement chain (MRC) among three (or more than) objects and provide a formal mathematical modeling for the MRC. Second, we design a query-sensitive similarity measure and incorporate it into the MRC, i.e., the Qs-MRC. Last but not least, we exploit the effectiveness of Qs-MRC for sentence ranking in query-oriented multi-document summarization. The work suggests that it is worth further studying on more appropriate and mathematical sound query-sensitive similarity measures and more accurate term context representation. We further study the parameter settings in MRC. Base on it, we present a framework to model the two-level mutual reinforcement among sentences as well as documents. We also explore an interesting and important property of the proposed algorithm. When examined on the DUC 2005 and 2006 data sets for the task of query-oriented multi-document summarization, significant results are achieved.The main contributions are,1) We present a document-sensitive graph model and algorithm for implicitly adding the document dimension into the existing graph·models and algorithms;2) We present the mutual reinforcement chain (MRC) model and algorithm for explicitly adding the document dimension into the existing graph models and algorithms. We further study the parameter settings (esp. the weight matrix) in MRC, and we explore an interesting and important property of the proposed algorithm;3) We present a query-sensitive similarity metric, which can incorporate the query influence into the MRC to cope with the need for query-oriented multi-document summarization. The proposed query-sensitive similarity can be also used in other application scenarios, such as information retrieval and other natural language processing tasks;4) We conduct theoretical analysis on all the proposed models and algorithms, which makes them general enough for other applications;5) We examine the effectiveness of the proposed models and algorithms on DUC enaluations, i.e., the generic and query-oriented multi-document summarization tasks.
Keywords/Search Tags:Multi-document Summarization, query-oriented summarization, document-sensitive graph model, mutual reinforcement chain, two-level mutual reinforcement, query-sensitive similarity
PDF Full Text Request
Related items