Font Size: a A A

Applications of Web link analysis

Posted on:2009-07-15Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Gyongyi, Zoltan IstvanFull Text:PDF
GTID:1448390002995530Subject:Computer Science
Abstract/Summary:
Web search engines augment traditional text-based information retrieval techniques with hyperlink analysis to increase the relevance of search results. The most thoroughly studied application of link analysis is the authority-based ranking of web pages. We look beyond assessing page authority and use link analysis to solve other problems that search engines encounter in their quest for better search results.; First, we focus on combating search engine spamming: actions intended to mislead search engines into ranking some web pages higher than they deserve. Over the last five years, the amount of search engine spam has increased dramatically, leading to a degradation of web content quality. To set the stage, we survey current spamming techniques and organize them into a comprehensive taxonomy. Next we present our first tool for combating spam: TrustRank, which combines input from human experts with link analysis to semi-automatically separate reputable, good pages from spam. Then we delve into detecting a particular technique called link spamming. We begin by analyzing exactly how and why link spamming works. We continue by introducing the concept of spam mass, a measure of link spamming's impact on the ranking of a page. Finally, we discuss how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from link spamming.; Second, we turn our attention to web page categorization. Even though web pages are hyperlinked, most proposed efficient classification techniques take little advantage of the link structure and rely primarily on text features. We introduce a link-based approach to classification, centered on summarizing relevant link information about a page into a numeric vector. Our approach can be used in isolation or in conjunction with text-based classification.; The experimental results in this dissertation indicate that the proposed spam detection and web categorization techniques work well on actual web data.
Keywords/Search Tags:Web, Link, Techniques, Search engines, Results, Spam
Related items