Font Size: a A A

Link-based search for similar pages on the Web

Posted on:2005-06-02Degree:M.C.ScType:Thesis
University:Dalhousie University (Canada)Candidate:Xiaomeng, WanFull Text:PDF
GTID:2458390008482307Subject:Computer Science
Abstract/Summary:
How to identify similar web pages is a crucial task for a search engine. Traditionally, similarity is computed based on the content of web pages. Recent research shows that the link structure might also be exploited for the similarity task. In this project, we present several such link-based (or graph-based) algorithms to identify the relevant web pages. They were implemented and evaluated on the .GOV data set, which is a filtered crawl of the .gov domain that was prepared for the Web track of TREC. Their performances were compared with those from a previous proposed link-based algorithm, Dean and Henzinger's Companion algorithm, and the TFIDF which is one of the prevalent content-based algorithms.; Our study shows that our algorithms perform either better than or competitive with the Companion method. Furthermore, the useful results from our algorithms are fairly different from those from the TFIDF.
Keywords/Search Tags:Web, Pages, Link-based, Algorithms
Related items