Font Size: a A A

The Approach For Event-based Multi-document Automatic Summarization

Posted on:2011-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z M GuiFull Text:PDF
GTID:2178360305468259Subject:Natural language processing
Abstract/Summary:PDF Full Text Request
As a simple query of the internet, you can get hundreds of pages; it brings users to spend a lot of effort to find useful information. In face of tens of thousands of those pages, which contains most of the same information, but contains a small amount of different information and a large number of non-relevant data, the urgent need a tool to help people to quickly view the information, which provides not only the direct documents. And also provides the information which is after fishing processing, and including the importance comprehensive information of these documents.Multi-document summarization technology is that condensed a large number of collected information on related topics into one, which contains all-round of the topic content, and concise content, well-organized summary with low redundancy to help people from the tedious, redundant information in the freeing. It aimed to solve the difficulty of obtaining useful information from the massive data, and improve the speed of information access and browsing, to meet different users'individual needs for information.This paper has developed a complete event-based multi-document summarization system, which mostly targeted on the internet event pages. The system is able to obtain the events and taken related information from the internet events pages, and integrate relevant information, compressed into the auto-summary of the submitted event to the user. This article started from the following aspects to research:First, this paper developed an Internet-based network model of Chinese Text Summarization events. The model adopts the idea of meta-search engine to directly obtain a list of events within a certain period of time from the network, and the user also can enter a topic of the interest event; then collecting the related information from the internet using search engine technology; Last using multi-documents auto-summarization technology to generate the event summary to the user.Second, in the event feature extraction, this paper using an inverse odds ratio technology to extract key phrases, and then select the highest weight of the top 25 feature items as the corresponding set of relevant characteristics of the event; then calculate the weight of the sentence from both internal and external aspects of the sentence. Within the sentence, the main consideration factors such as whether the sentence contains the key features, sentence length and the number of event features words which the sentence contains; and external aspects of the sentence, major consideration some linkages between sentences. Last to reduce the redundant information of sentences, the paper is based on the idea of fusion multi-feature to calculate the similarity of sentences. First of all, can describe the matching degree between strings; Secondly, to measure the sentence similarity from the semantic of words; finally, can measure the sentence similarity from the perspective of the sentence semantic.Third, at the system evaluation, this paper uses the events of Baidu Wikipedia explained or the manual as the standard summary, then uses the average coincidence rate of the sentence, international standard evaluation system of multi-document summarization ROUGE-2 and ROUGE-4 etc. three evaluation indicators to measure the quality of the system summary.
Keywords/Search Tags:Multi-document Summarization, Event Summarization, Feature Extraction, Sentence Weight, Sentence Similarity Calculation
PDF Full Text Request
Related items