Font Size: a A A

Summarization Based On Fractal Theory

Posted on:2005-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:F LuFull Text:PDF
GTID:2168360125950490Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the development of information technology, especially the popularization of Internet and large scale storage medium , Have formed the boundless ocean of the information. How to look for and utilize all kinds of information that user needs? In order to help the user to look for and utilize information effectively rapidly, various kinds of information processing technology arise . The information-overloading problem can be reduced by automatic summarization. Many summarization models have been proposed previously. None of the models are entirely based on document structure, and they do not take into account of the fact that the human abstractors extract sentences according to the hierarchical document structure. Document structure can be described as fractals that are some mathematical objects with high degree of redundancy. In the past, fractal theory has been widely applied in the area of digital image compression, which is similar to the text summarization in the sense that they both extract the most important information from the source and reduce the complexity of the source. The fractal summarization model is the first effort to apply fractal theory to document summarization. It generates the summary by a recursive deterministic algorithm based on the iterated representation of a document. The fractal summarization highly improves the divergence of information coverage of summary and it is robust and transparent, the user can easily control the compression ratio, and the system generates a summary that maximize the information coverage and minimize the distance of summary from the source document.Fractal view is fractal-based method for controlling information displayed .The fractal tree is extended to any logical tree. The fractal value of root of a tree is set to 1, and the fractal value is propagated to other nodes by dividing the fractal value of parent node by the number of child nodes and assigning the value to the child node as their fractal value. A threshold value is chosen to control the amount of information displayed, the nodes with a fractal value less than the threshold value will be hidden. The Fractal Summarization Model is developed based on the models of fractal view and fractal image compression . The source document is partitioned into range-blocks according to document structure and represented as a fractal tree .The fractal value of each node is calculated as the sum of sentence weights of the sentences under the range-block. .User may choose a compression ratio to specify the ratio of sentences to be extracted as the summary. The sentence quota of the summary can be calculated accordingly and it will be propagated to the child-nodes directly proportional to their fractal values. Fractal Summarization Algorithm1. Choose a Compression Ratio and Threshold Value.2. Calculate the total Sentence Quota of the summary.3. Partition the document into range blocks.4. Transform the document into fractal tree.5. Set the current node to the root of the fractal tree.6. Repeat6.1 For each child node under current node,Calculate the fractal value of child node.6.2 Allocate Quota to child nodes in proportion to fractal values.6.3 For each child nodes,If the quota is less than threshold valueSelect the sentences in the range block by extractionElseSet the current node to the child nodeRepeat Step 6.1, 6.2, 6.37. Until all the child nodes under current node are processed.We adopt concept counting in Fractal Summarization Model to select theme feature. Because of the need of the article rhetoric or the difference of the style of the article, It often appears that people use word to be simple and direct , synonym is replaced. However, it is not enough that we only depend on information of feature frequency . Pay attention to such a fact, because support same level to express one theme together , the vocabulary selected for use may present the synonym to replace , but the concept expressed is unanimous. Mapping feature to concept, which is helpful...
Keywords/Search Tags:Summarization
PDF Full Text Request
Related items