Font Size: a A A

Tag-Oriented Web Page Summarization

Posted on:2011-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhuFull Text:PDF
GTID:2178360302474693Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Web2.0 technology, more and more users give tags information on web page documents. Social annotations on a Web document are highly generalized description of topics contained in that document. At present, most of web document summary technology don't take into account this important user information, which makes it hard for generated summary to grasp the main idea of the target page. Therefore, in this paper, we propose a tag-oriented web document summarization approach.A new tag ranking algorithm named EigenTag is proposed in this paper to reduce noise in tags.This algorithm takes into account both user information and tag information to determine the importance of tags and impoves those weight scores of quality tags which reflect the main idea of the content of the original web page documents. EigenTag algorithm can effectively lower the impact of low-quality tags on tags ranking.Meanwhile, tag scoring for sentences is based on the precise match between tags and words in a sentence. Given that each web document has a limited number of quality tags which can reflect the main idea of origin web pages. Thus, the odd that an obviously relevant sentence using a related word (but not the tag itself) results in a mismatch is high, furthermore, the score of the tag cannot be assigned to the related word in the sentence effectively. So, in this paper, we use the association rule mining algorithm FP-Growth to expand the high quality seed tag set. Also, a new method to determine the weight score with respect to extended tags is proposed.In this paper, we employ four tag-oriented document summary methods. Experimental results show that tag-oriented summarization has a significant improvement over those not using tags.
Keywords/Search Tags:document summarization, tag, extended tags, association rule
PDF Full Text Request
Related items