Font Size: a A A

Research On Automatic Multi-document Summarization Based On Statistics And Semantic Analysis

Posted on:2010-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:X C SongFull Text:PDF
GTID:2178360302959590Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, the development of computer technology and the popularity of Internet bring us into the ocean of information, the growing speed of information is out of the imagine of us. Currently, the most important way of obtaining information is by using search engines. But the results of search have lots of irrelevant information. So people can hardly get the information which they need. Multi-document summarization is the new technology to solve this problem. It can relieve people from redundant information by filtering and collecting information from a set of topic related documents and extracting concise and comprehensive information. Based on the current technology of Multi-document automatic summarization, the thesis gave research on key technology of semantic concept extraction and clustering algorithm and implemented a multi-document automatic summarization based on statistics and semantic analysis. The main research work and characteristic of the thesis are as follows:(1) By replacing traditional word statistic from concept statistic and getting summary by building conceptual vector space model, we diminished the impact of intersection existing between word vectors in traditional vector space model.(2) The traditional methods usually compute the similarities by means of word form or word co-occurrence. This thesis improved the method of computing the similarities of the sentences by analyzing the intrinsic interaction between the words and enhanced the rate of accuracy.(3) By using WordNet to disambiguate words sense and to merge concept-tree, this thesis built a tree to give a description of document sets and proposed a topic concepts extract method. This method extracts the topic concepts from document sets and weights the sentences and enhances the quality of multi-document automatic summarization.(4) Based on researching the topic identification, we improved and optimized the OPTICS algorithm and applied it to multi-document automatic summarization. The new method can give a more exactly identification to the topic of document sets and a more comprehensive result.The multi-document automatic summarization system based on statistic and semantic analysis is an improvement to traditional multi-document automatic summarization system. The results showed that it is more effective than traditional system and can extract information exactly. Multi-document summarization has a bright future not only in the aspect of being independent system, but also as a part of search engines. With the advancement of internet, it will have more and huger development space.
Keywords/Search Tags:Multi-document Automation Summarization, Vector Space Model, concept extract, topic identification, sentence clustering
PDF Full Text Request
Related items