Font Size: a A A

Automatic Summarization For Chinese Text Based On Sub Topic Partition And Entence Features

Posted on:2013-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:J P ZhangFull Text:PDF
GTID:2248330362974140Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the explosion of electronic information on web, it has become more importantto obtain the information required accurately and efficiently. As an overview of thedocument content, the summarization has the advantages of conciseness, generality,readability and objectivity, which can meet the needs of people to find out usefulinformation efficiently. Therefore, the automatic summarization has become one of thehot topics in natural language processing field in recent years.This paper first introduces the conception of the summarization, the currentresearch status and methods of automatic summarization. Then according to theadvantages and disadvantages analysis of the automatic summarization based onLexRank algorithm, this paper proposes a new method of automatic summarizationbased on sub topic partition and sentence features:①The Chinese document is viewed as an undirected weighted graph, in whichvertexes represent the sentences and edges are defined in terms of the similarity relationbetween pairs of sentences. Then we can get the maximum spanning tree and use theimproved K-means algorithm to achieve clustering based on the tree,each subclassrepresents a sub topic. The idea of sub topic partition for the document can solve thelow theme coverage and get more comprehensive summarization.②In each sub topic, the weights of the sentences are computed according to thesentence salience got by LexRank algorithm and the scores of the sentence features,such as length, position, title words, cue words, sentence structure and so on. In this way,the weight of every sentence will be more accuracy and comprehensive.③The sub topics should be sorted into decreasing order before extractingsummarization, and then the sentence with the highest weight in each sub topic will beextracted into the summary list by compression ratio required. After choosing onesentence from the sub topic, the weights of other sentences in it should be recalculatedto avoid extracting similar ones next time. In this way, the summarization cancomprehensively express the important topics of the document with less redundancy.Finally with the single document summarization corpus supplied by ResearchCenter for Social Computing and Information Retrieval of Harbin Institute ofTechnology, comparison experiments are carried out on three different automaticsummarization systems. The first one is the method proposed in this paper, the secondone is based on LexRank algorithm, the last one is based on sentence features and LexRank algorithm. The experimental results show that the method proposed in thispaper performs better than the other two in terms of precision, recall, F-measure,ROUGE value and can get high quality summarization.
Keywords/Search Tags:Automatic summarization, Sub topic partition, Sentence features, K-meansalgorithm, Sentence weight
PDF Full Text Request
Related items