Font Size: a A A

Research On Key Technologies Of Chinese Multi-Document Summarization

Posted on:2008-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:C YaoFull Text:PDF
GTID:2178360245998092Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the information on the World Wide Web is growing with the speed of exponent, the information-overloading problem has become a significant problem. Since large numbers of web pages that include much redundant information, it is difficult to get what we really need. Recently, how to display brief and concise information to user and improve the efficiency of acquiring needed information is paid much attention to by researchers, and the multi-document summarization become a hot research field.Multi-document summarization is the process of distilling the most important information from a source to produce an abridged version for a particular user and task. In this paper, three key techniques of Chinese multi-document summarization have been discussed. They are sentence weighting model, sentence selection and sentence ordering which discussed in detail as follows:1. We make a study on sentence weighting model. Automatic extraction of topic signatures is implemented using Log Likelihood Ratio, and a sentence weighting model is proposed with the features of vocabulary weight, location characteristics and the length of sentence. The result of experiment shows that this model can give important sentences higher weights, so that a summary covering important information can be generated.2. In the part of sentence selection, we propose a sentence optimal selection method. By deleting sentences that contain little important information one by one from a big candidate summary sentence set until reaching the appointed length, this method can generate a summary with most important information and least redundancy. We do experiments both on English corpus and Chinese corpus, and make a comparison between the results under different parameter values. And the results prove the validity of this method.3. The order of summary sentences can do a great influence on human's understanding of the original articles, so we make a study on sentence ordering. First, we introduce the ordering algorithms already existed, mainly on Majority Ordering Algorithm, then propose a cohesion-based, bottom-up sentence ordering algorithm for Chinese multi-document summarization. The experiment shows that this method is better than Majority Ordering Algorithm.
Keywords/Search Tags:multi-document summarization, topic signature, sentence weighting, summary sentence selection, summary sentence ordering
PDF Full Text Request
Related items