Font Size: a A A

Research On Automatic Text Summarization Technique Of News Documents

Posted on:2008-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2178360242472275Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As an important research on natural language processing and powerful assistant of information retrieval, automatic summarization technology has been successfully applied to both military and civil fields to increase the efficiency of information retrieval and spread. This paper mainly discusses the methods of automatic summarization of news documents.Firstly, a novel automatic summarization method based on fuzzy decision theory is proposed to select the important sentences as the summary. Traditional method for extracting topic sentence is implemented based on a fixed and inflexible formula, which limits the number of text features and treats these features in a same way. The new method merges the text features by fuzzy decision, and adds features different weights to suit for different texts. Experiment results show that the summary produced by the new method is more meaningful than by the traditional one.Secondly, an automatic summarization method based on clustering algorithm is presented, where the clustering algorithm is used to obtain the sub-topics of multi-topic documents. A clustering principle named Max-Min Self-similarity (MMS) is provided to get the initialization value of K-means automatically. To solve the problem brought by K-means, a new algorithm MMS_MCMRS_PAM is proposed with the combination of MMS and MCMRS_PAM (Multi-Centroid Multi-Run Sampling Scheme_Partitioning around Medoids). Experiment results demonstrate that summary produced by the new algorithm presents an obvious advantage on the neglect of the points and coverage of topics compared with other clustering algorithms.Finally, a multi-document summarization method is proposed, in order to solve the practical problem of redundancy information of document sets on the same event on line. In the new method, semantic space is used to compute the similarity between words, and clustering algorithm is used to distinguish main-topic and sub-topic. In addition, topic matching method is used to select topic sentences as summary and update the summary when new documents coming. Experiment results show that multi-document summary by the new method can give prominence to the main-topic and cover other sub-topic, especially obtain new topic dynamically.
Keywords/Search Tags:Automatic Summarization, Fuzzy Decision Theory, Clustering Algorithm, Topic Sentence, Semantic Space, Topic Matching, News Document
PDF Full Text Request
Related items