Research On The Topic-oriented Summarization For Web Documents

Posted on:2012-05-15

Degree:Master

Type:Thesis

Country:China

Candidate:G X Deng

Full Text:PDF

GTID:2218330368992869

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, information explosion has become a very serious problem currently. It becomes more and more difficult for us to acquire information effectively for the huge set of web pages that contains much duplicate information. So how to provide concise information for users to meet their requirement is an urgent problem. To solve above problem, we make a study on the method of topic-oriented summarization for Web documents. The research can be summarized as follows:1. We did research on classifying those sentences in Web documents into each aspect of the special topic. We proposed a method which based on dependency relation to compute sentence similarity. A clustering approach based on that method also was applied to cluster those sentences into some clusters and the features (words and grammar) from each cluster were extracted to form patterns. These patterns can be used to identify those topic related sentences. Further more, a classifier was used to identify topic related sentences which adjusted the weight of word according to its level in a parse tree, selected some effective dependency relations as features and used syntactic tree as structure features based on verb-driven tree pruning.2. In the part of sentence selection, this dissertation proposed a novel sentence selection method. Firstly each candidate sentence was labeled according to those features of Web documents. Then the score of a candidate sentence was adjusted based on sentence similarity, document links, sentence anchors and the relative position between sentences. Finally, a MMI algorithm is used to choose sentence according to scores of candidate sentences.3. A novel context-based approach to sentence ordering for multi-document summarization was proposed. Our method first detected whether two summary sentences should be adjacent according to the similarity between one summary sentence and the context of the other summary sentence, and then computes the reliability based on the similarity and relative position. The first sentence is decided based on features of the sentence. The sentence that has the maximum adjacency reliability with the previous sentence will be selected as next one.Evaluation shows our approaches achieve high performance and the latter is better than the state-of-the-art ones.

Keywords/Search Tags:

multi-document summarization, Web Documents, sentence classification, dependency relation, sentence ordering

PDF Full Text Request

Related items

1	Research On Key Technologies Of Chinese Multi-Document Summarization
2	Research On Some Key Technologies Of Sentence Ordering For Information Fusion
3	Research On Summary Sentence Selection And Ordering In Query-focused Multi-document Summarization
4	Research And Application Of Multi-document Automatic Summarization
5	Multi-document Summarization Based On Basic Element
6	The Approach For Event-based Multi-document Automatic Summarization
7	Chinese Query-Focused Multi-document Summarization Based On Cloud Model
8	Automatic Summarization Method For Single Document Based On Sentence Embedding And Coreference Relation
9	Research Of Web Multi-document Automatic Summarization
10	Research On Extractive Multi-document Summarization Using Supervised Deep Learning