Citation Clustering Based Automatic Multi-Document Summarization

Posted on:2014-07-25

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:1268330425477901

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

The popularity of the Internet has brought a sharp increase in electronic literatures, which brings a huge challenge for researchers, especially junior researchers to acquire useful information from massive amount of information effectively and accurately. Therefore, how to summarize domain knowledge so that to improve the efficiency researchers access to information is becoming more and more important. Multi-document summarization is an important research topic in natural language processing. It can summarize and compress documents on the same topic, which can relieve researchers from reading all of the documents and avoid information overload by providing a concise and comprehensive summary.In order to summarize the related works in the domain of interest, based on the existing multi-document summarization technologies, we study citation clustering based automatic multi-document summarization, and mainly study citation clustering and summary generation.In the section of citation clustering, based on Vector Space Model (VSM), by different text representation and similarity computation methods, we get six clustering indicators, namely, publication abstract similarity (PAS), publication query-sensitive abstract similarity (PQAS), publication citation context similarity (PCCS), publication query-sensitive citation context similarity (PQCCS), publication co-cite mutual information (PCMI) and publication co-cite proximity score (PCPS). And based on the relationship between cited positions and topics of the citations, we propose a citation cited proximity based clustering evaluation method to evaluate the clustering results based on the six indicators.The purpose of citation clustering is to group the user query related documents into different clusters so as to prepare for summary generation.In the section of summary generation, in order to condense the multiple documents on the same or similar topic in each grouped cluster, we use different summarization methods, such as LexRank, Query Sensitive LexRank, MMR and LexRankMMR, to generate a paragraph of different length by extracting important sentences from the candidate sentence set to describe these documents. Finally, we evaluate each generated paragraph and the summary composed of these paragraphs by experiments.

Keywords/Search Tags:

CitationClustering, Automatic Multi-document Summarization, Clustering Evaluation, Evaluation of a Summary

PDF Full Text Request

Related items

1	Research And Application Of Multi-document Automatic Summarization
2	Research And Implementation Of Document Summarization Based On Combined Multi-Feature
3	The Design And Implementation Of Automatic Summarization System On Chinese Web Pages
4	Chinese Multi-document Automatic Summarization Extraction Based On The Combination Of LDA And TextRank
5	Study On Chinese Text Automatic Summarization Based On Concept Extension And Integrated Evaluation Method
6	Research Of Document Summarization Based On Topic Analysis
7	Research On Mutually Reinforced Manifold-ranking For Multi-document Summarization
8	Research On Key Technologies Of Chinese Multi-Document Summarization
9	Research On Summary Sentence Selection And Ordering In Query-focused Multi-document Summarization
10	The Study, Based On Themes By Web Document Automatic Summarization