Font Size: a A A

Research On The Topic-oriented Summarization For Web Documents

Posted on:2012-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:G X DengFull Text:PDF
GTID:2218330368992869Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, information explosion has become a very serious problem currently. It becomes more and more difficult for us to acquire information effectively for the huge set of web pages that contains much duplicate information. So how to provide concise information for users to meet their requirement is an urgent problem. To solve above problem, we make a study on the method of topic-oriented summarization for Web documents. The research can be summarized as follows:1. We did research on classifying those sentences in Web documents into each aspect of the special topic. We proposed a method which based on dependency relation to compute sentence similarity. A clustering approach based on that method also was applied to cluster those sentences into some clusters and the features (words and grammar) from each cluster were extracted to form patterns. These patterns can be used to identify those topic related sentences. Further more, a classifier was used to identify topic related sentences which adjusted the weight of word according to its level in a parse tree, selected some effective dependency relations as features and used syntactic tree as structure features based on verb-driven tree pruning.2. In the part of sentence selection, this dissertation proposed a novel sentence selection method. Firstly each candidate sentence was labeled according to those features of Web documents. Then the score of a candidate sentence was adjusted based on sentence similarity, document links, sentence anchors and the relative position between sentences. Finally, a MMI algorithm is used to choose sentence according to scores of candidate sentences.3. A novel context-based approach to sentence ordering for multi-document summarization was proposed. Our method first detected whether two summary sentences should be adjacent according to the similarity between one summary sentence and the context of the other summary sentence, and then computes the reliability based on the similarity and relative position. The first sentence is decided based on features of the sentence. The sentence that has the maximum adjacency reliability with the previous sentence will be selected as next one.Evaluation shows our approaches achieve high performance and the latter is better than the state-of-the-art ones.
Keywords/Search Tags:multi-document summarization, Web Documents, sentence classification, dependency relation, sentence ordering
PDF Full Text Request
Related items