Font Size: a A A

Citation Clustering Based Automatic Multi-Document Summarization

Posted on:2014-07-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:1268330425477901Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The popularity of the Internet has brought a sharp increase in electronic literatures, which brings a huge challenge for researchers, especially junior researchers to acquire useful information from massive amount of information effectively and accurately. Therefore, how to summarize domain knowledge so that to improve the efficiency researchers access to information is becoming more and more important. Multi-document summarization is an important research topic in natural language processing. It can summarize and compress documents on the same topic, which can relieve researchers from reading all of the documents and avoid information overload by providing a concise and comprehensive summary.In order to summarize the related works in the domain of interest, based on the existing multi-document summarization technologies, we study citation clustering based automatic multi-document summarization, and mainly study citation clustering and summary generation.In the section of citation clustering, based on Vector Space Model (VSM), by different text representation and similarity computation methods, we get six clustering indicators, namely, publication abstract similarity (PAS), publication query-sensitive abstract similarity (PQAS), publication citation context similarity (PCCS), publication query-sensitive citation context similarity (PQCCS), publication co-cite mutual information (PCMI) and publication co-cite proximity score (PCPS). And based on the relationship between cited positions and topics of the citations, we propose a citation cited proximity based clustering evaluation method to evaluate the clustering results based on the six indicators.The purpose of citation clustering is to group the user query related documents into different clusters so as to prepare for summary generation.In the section of summary generation, in order to condense the multiple documents on the same or similar topic in each grouped cluster, we use different summarization methods, such as LexRank, Query Sensitive LexRank, MMR and LexRankMMR, to generate a paragraph of different length by extracting important sentences from the candidate sentence set to describe these documents. Finally, we evaluate each generated paragraph and the summary composed of these paragraphs by experiments.
Keywords/Search Tags:CitationClustering, Automatic Multi-document Summarization, Clustering Evaluation, Evaluation of a Summary
PDF Full Text Request
Related items