Font Size: a A A

The Automatic Summarization And Evaluation System Based On Ranking Topic Model

Posted on:2015-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y T XuFull Text:PDF
GTID:2268330428981782Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of computer applications and the rapid development of Internet technology, a lot of information appeared in the form of electronic text in front of people. How to fast and accurately get informations from massive unstructured texts become a hot and difficult problem for researchers.People increasingly want to extract and concentrate the text to express the main contents of text with concise words,thereby reducing the time to ob-tain useful information. Automatic summarization is a good tool to solve this problem. Topic model is widely used in automatic abstracting, the original topic sentences which obtained by the topic distribution generated the abstracts. But the relationship of the topic distribution in text sets is parallel. So the sentences in abstracts didn’t have corresponding importante ranking. In this paper, we combined the topic model with ranking algorithms to solve this problem, the rangking topic model we got had effectively improved the quality of abstracts. This paper’s work are as follows:1. Ranking Topic Model based on feature selection. In this paper the feature selection methods we uesd is feature similarity. We calculated the largest compression index among topics based on topic distribution, in order to remove redundant and calculated topic weights, then we got the ranking topic distributions to guide sentences extraction. Experimental re-sults on the DUC2002proved that feature similarity algorithm improves the quality of article summary.2. Ranking Topic Model based on mutual information maximum spanning tree. We calculated the mutual information based on topic distribution, then build the maximum s-panning tree using mutual information. And we calculated weights to rank topics, then we got the ranking topic distributions to guide sentences extraction. Experimental results on the DUC2002proved that mutual information maximum spanning tree algorithm improves the quality of article summary.3. Use crowdsourcing to evaluate the system manually. Taking into account the cost of manual evaluation and the promotion of platform, we used platform-Wechat to evaluate the extract. Taking into account the Wechat is widely used in college students, so the data set used in this experiment is CET Reading Comprehension, so it has some applicabilities and be convenient to the promot the platform.
Keywords/Search Tags:Automatic Summarization, Topic Model, Ranking Algorithms, Summa-rization Evaluation, Wechat
PDF Full Text Request
Related items