Font Size: a A A

The Research And Implementation Of Single-document Chinese Text Summarization System

Posted on:2010-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:H W CengFull Text:PDF
GTID:2178360272491577Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In today's era of information explosion, people who face with a large number of original data without finishing will be at a loss what to do. The technology of text summarization can digest it to provide a more effective information-processing technologies and tools. However, recent text summarization systems, especially Chinese text summarization systems don't achieve good results. Therefore, this paper does thoroughly research on Chinese text summarization system.The extracted single-document Chinese text summarization system which advanced by this paper takes a single pure document as input and Automatically extracts sentences which fully and accurately reflect the Central meaning of the document to compose the summarization of the document. Firstly, the system obtains words and speech tagging of the document by using the tool named ICTCLAS. Secondly, the system builds vector space model for the document and extracts features of sentences. Thirdly, the system automatically learns ways and parameters of features with NaiveBayes algorithm to change summarization problem into classification problem. Finally, According to the results of classification, the system extracts central sentences to compose the summarization of the document.Firstly, this paper summarizes the technology of text summarization. Secondly, this paper introduces overall design which involves the design of four modules—ChineseSegmentation,FeatureExtraction,BayesClassify and SentenceExtraction. Thirdly, this paper introduces the implement of the text summarization system. Finally, this paper introduces various parameters which evaluate system performance and evaluates system performance with weka and 30 documents with variety of topics.The extracted single-document Chinese text summarization system which advanced by this paper is divided into four modules. Every module selects the optimal scheme by comparing and analyzing the recent technologies of summarization. The system tries to obtain the optimal results by making every module optimal. This paper uses Weka and 30 documents with variety of topics to test system performance. And the experiment shows that the results of extracting central sentences are good.
Keywords/Search Tags:text summarization, vector space model, NaiveBayes
PDF Full Text Request
Related items