Font Size: a A A

Sentence Extraction For Multi-Document Summarization Based On Topic Model And Semantics

Posted on:2015-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:J YuFull Text:PDF
GTID:2298330467963357Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, multi-document summarization is widely concerned and develops rapidly. It aims to analyze multiple relevant documents, extract non-repetitive important information, and generate a succinct and readable summary, to reduce the users’ information load and improve the efficiency of information processing.Improving and expanding the existing multi-document summarization technology is an important trend in this field. In multi-document summarization, topic relationship and semantic information is essential to text understanding. In extraction based summarization methods, sentence extraction is pivotal and complex. Therefore, we study topic model and semantics based sentence extraction method for multi-document summarization. The major work is as follows:1) Study the generation and development of topic model and analyze the theory, core thought and performance of classic models, especially hierarchical Latent Dirichlet Allocation (liLDA).2) Design and implement the sentence extraction method for multi-document summarization system. Analyze the modeling result, influence factors and result adjustment of hLDA. Compare hLDA with traditional clustering methods to prove the significance of choosing hLDA for multi-document summarization.3) Design sentence scoring algorithms based on hLDA and semantics, and evaluate the scoring results by experiments.4) Study sentence extraction strategies to extract candidate summary sentences. Work on the automatic evaluation for summary sentence extraction and evaluate the summary candidate sentences. Compare the evaluation result with other sentence extraction systems, showing that our method has good performance.This work was supported in part by the National Science Foundation of China(NSFC) under Grants61202247and71231002.
Keywords/Search Tags:multi-document summarization, topic modelsemantic information, hierarchical Latent Dirichlet Allocation sentencescoring summar
PDF Full Text Request
Related items