Font Size: a A A

Searching Topic-specific Authoritative Information Sources On The Web With Content And Link Analysis

Posted on:2004-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y H HanFull Text:PDF
GTID:2168360092981065Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Search engine is the most commonly used tool for Web information retrieval; however, its current status is still far from satisfaction. So a post-processing operation, named topic distillation, is needed before the search results are returned to the user. Many today's Web search services can deliver retrieved results to users at both page and site granularities, but all existing topic distillation algorithms model the link graph at a page granularity. Such a model not only fails to satisfy user's multiple-granularity information needs, but also tends to define unjust influence weights for different authors of Websites. Moreover, the classical algorithm, HITS, is likely to converge at an irrelevant tightly knit community (TKC), thus lead to topic drift. This paper presents an improved algorithm g-HITSc (multiple-granularity HITS combining with content analysis). The algorithm can construct a link graph at a page or site granularity by the user's need, and compute the relevance of each node in the graph to the query topic with content analysis. After eliminating lower relevance nodes and assigning relevance weights to the qualified ones, it applies weighted I/O operations in the iteration. Theoretical analysis and experimental results show that the new algorithm can avoid topic drift and identify more reasonable and meaningful authorities and hubs on the topic.
Keywords/Search Tags:topic distillation, HITS, multiple granularity, content analysis, link analysis, Web IR
PDF Full Text Request
Related items