Font Size: a A A

Research On Automatic Summarization System By K-Means Method

Posted on:2013-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:X QiFull Text:PDF
GTID:2248330395959374Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Automatic summarization system is to get summarization of an article using algorithm incomputer science. And a summarization is a short passage which can present the main idea ofan article with a small number of sentences. Now, in the background of information era,e-article takes place of traditional article generally. Therefore, summarization plays animportant role for people to choose an article whether is worth to read.The automatic abstracting is divided into the single documents and the multi-documentsaccording to the digest object, and divided into the automatic extract, the automatic digestbased on the understanding and the automatic digest based on the information extractionaccording to the using technology, and these three methods all suit in the single documentsand the multi-documents digest.The automatic summarization evaluation method can be divided into two categories froma broad perspective: one is the internal assessment (Intrinsic) method, it evaluates thesummarization quality by directly analyzes the quality of the digest, which is related to thesystem purpose. The abstract content detection includes three measures: precision, recall rateand F-value. The abstract readability test includes the grammar correct degree of coherenceand cohesion; another kind is called external assessment (Extrinsic) method, which is anindirect method, and the function of the system, through the abstract application on aparticular task in effect to evaluate the summarization system is good or bad.Clustering algorithms come from database theory, it aims to put some objects that havestrong relationship into a class, and make sure that the similarity between classes is smallwhile in a class is large. There are three kinds of clustering algorithms: Layer-based algorithm,cutting-based algorithm and density-based algorithm.K-means clustering algorithm is first used in database clustering algorithm, along withthe rapid development of computer science, the algorithm is widely applied to biocomputing,natural language processing and pattern recognition. The algorithm is based on K as theparameter, the n objects are divided into K clusters, the cluster object with higher similarity,the similarity between the lower and cluster. Similarity calculation based on a cluster ofobjects in the calculation of the mean value, which can be seen as the centroid of the cluster.This paper is based on the "frequency" vector of sentence clustering method to do the English article automatic abstract generation system using K-means algorithm as theclustering algorithm. This article first introduces the automatic summarization system and thepopular clustering algorithms at present, and then proposes the automatic abstract generationmodel based on the K-means clustering algorithm, finally we have carried on the simulationexperiment and the conclusion of this paper summary and outlook.
Keywords/Search Tags:Automatic summarization system, Clustering algorithms, K-means algorithm
PDF Full Text Request
Related items