Font Size: a A A

A Automatic Summarization Method For Chinese Document Based On Comprehensive Background Concept Lattice

Posted on:2012-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:C GaoFull Text:PDF
GTID:2178330335491534Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology and the popularity of Internet, the needs of text processing such as automatic summarization is increasing. The research of Chinese automatic summarization starts later than English automatic summarization and the characteristics of Chinese language make the process of automatic summarization more difficult.This thesis first summarizes and analyzes the research status of automatic summarization. And then it presents an automatic summarization method for Chinese document based on concept lattice theories and its corresponding system framework named CBCL-TAS. The thesis mainly describes the algorithms for document splitting, words grab, feature extraction and text concept lattice construction. The automatic summarization system built based on the proposed method is evaluated from the efficiency and performance perspectives. The proposed method uses comprehensive background of the text to establish the corresponding concept of "framework" of full text without the needs to consider the semantics of sentence, the relationships between sentences and the similarity between sentences. In this way, the need to understand the natural language is replaced with the need to understand the formal background of the text. Splitting the document into segments is a necessary step of word grab for large scale document. The word grab algorithm proposed in this thesis adjusts the size of segment dynamically and compared with the algorithms with fixed segment, our algorithm is more efficient when dealing with large-scale text with more than 500 000 words.Experiment results show that the summarization produced by the system has good quality especially for argumentative essays. And the improved algorithm of document splitting makes the process of word grab avoid the impact of text size.
Keywords/Search Tags:formal background, comprehensive text background, automatic summarization, text concept lattice
PDF Full Text Request
Related items