The Automatic Summarization And Evaluation System Based On Ranking Topic Model

Posted on:2015-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Xu

Full Text:PDF

GTID:2268330428981782

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the popularity of computer applications and the rapid development of Internet technology, a lot of information appeared in the form of electronic text in front of people. How to fast and accurately get informations from massive unstructured texts become a hot and difficult problem for researchers.People increasingly want to extract and concentrate the text to express the main contents of text with concise words,thereby reducing the time to ob-tain useful information. Automatic summarization is a good tool to solve this problem. Topic model is widely used in automatic abstracting, the original topic sentences which obtained by the topic distribution generated the abstracts. But the relationship of the topic distribution in text sets is parallel. So the sentences in abstracts didn’t have corresponding importante ranking. In this paper, we combined the topic model with ranking algorithms to solve this problem, the rangking topic model we got had effectively improved the quality of abstracts. This paper’s work are as follows:1. Ranking Topic Model based on feature selection. In this paper the feature selection methods we uesd is feature similarity. We calculated the largest compression index among topics based on topic distribution, in order to remove redundant and calculated topic weights, then we got the ranking topic distributions to guide sentences extraction. Experimental re-sults on the DUC2002proved that feature similarity algorithm improves the quality of article summary.2. Ranking Topic Model based on mutual information maximum spanning tree. We calculated the mutual information based on topic distribution, then build the maximum s-panning tree using mutual information. And we calculated weights to rank topics, then we got the ranking topic distributions to guide sentences extraction. Experimental results on the DUC2002proved that mutual information maximum spanning tree algorithm improves the quality of article summary.3. Use crowdsourcing to evaluate the system manually. Taking into account the cost of manual evaluation and the promotion of platform, we used platform-Wechat to evaluate the extract. Taking into account the Wechat is widely used in college students, so the data set used in this experiment is CET Reading Comprehension, so it has some applicabilities and be convenient to the promot the platform.

Keywords/Search Tags:

Automatic Summarization, Topic Model, Ranking Algorithms, Summa-rization Evaluation, Wechat

PDF Full Text Request

Related items

1	Research On Ranking Topic Models And Their Applications
2	Research Of Automatic Summarization Oriented To News Text
3	Topic-oriented Automatic Text Summarization
4	Research And Implementation Of Multi-document Automatic Summarization Technology Based On Topic Model
5	Research Of Document Summarization Based On Topic Analysis
6	Chinese Multi-document Automatic Summarization Extraction Based On The Combination Of LDA And TextRank
7	Research On Chinese Automatic Summarization And Its Evaluation Method
8	Research On Text Automatic Summarization Method
9	Research On Automatic Summarization Of News Events
10	Research On Document Summarization Based On LDA Model