Font Size: a A A

Research And Implementation Of Topic-based Mutli-Document Summarization

Posted on:2010-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:S J LiFull Text:PDF
GTID:2178360278457520Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Summary is a condensed text that reflects the core content of one or more documents accurately and comprehensively. Now in the background of information explosion, the growth of information is much more rapid than the manual summary. So people need a fast browsing tool which can provide important and comprehensive information of the documents directly, and submit the text to the user as a concise form.In this thesis, we investigate on the extractive multi-document automatic summarization, in the document feature model, we take the topic information feature into consideration, and take the sentence as a unit to extract summary sentence according to the features of the documents information. Using four features to calculate the importance of the sentences, such as TF*IDF, sentence position, similarity between the sentence and the theme, and sentence length. We combine the information of topic and documents and assign different weights to them in order to get better evaluation results. Furthermore, this thesis also assigns different weights for each feature in order to find the relationship between topic information and the other features of documents, so that the best evaluation result can be found. In the end, the conclusion is driven in this paper that the sentences in specified position contain more important information which is much closer to the topic, and the length of these sentences are not too long or too short. This indicates that there are some overlap between sentence position and the similarity with theme or the sentence length. This shows that we could not get the good evaluation results when all the features of documents are taken into accounts. It is necessary to identify the inter-relation between documents features and the topic, so that the quality of automatic summarization can become much better.
Keywords/Search Tags:topic information, TF*IDF, sentence position, the similarity between the sentence and the theme, sentence length, combination optimization
PDF Full Text Request
Related items