Font Size: a A A

Combating Search Engine Spam Using Community Discovery

Posted on:2017-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y F FengFull Text:PDF
GTID:2348330488459955Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Asking a search engine is the dominant way for people to find useful information on the Web. Since there are usually masses of pages related to a query, all search engines employ a ranking scheme to evaluate the value of web pages. Link-based ranking algorithms (e.g., PageRank), which are based on the assumption that links imply recommendations to the target pages, are the dominating ranking schemes.Trust propagation techniques have been widely used for link-based web spam demotion. Traditional such algorithms propagate trust in non-differential ways, i.e., a page propagates its trust score uniformly to its neighbors, without considering whether each neighbor should be trusted or distrusted. In this paper, based on the fact that spam pages are often densely connected, we propose a differential trust propagation scheme with community discovery. Firstly, with known spam pages as seeds, we extract communities which are observed to be mostly consisted of spam pages. Global and local community discovery algorithms are used in the first step. Spam pages may have higher ranking position if trust is propagated uniformly. Secondly, we use these communities to limit across-community-boundary trust propagation, i.e., the propagation of trust from a non-community member to a community member is penalized by a factor. With this penalizing scheme, differential trust propagation is realized, and most good-to-bad trust propagation is limited. Our penalizing scheme could be cooperated with all kinds of trust propagation algorithms. Experimental results show that the proposed penalizing scheme, despite being simple, surprisingly improves the performances of trust propagation algorithms such as TrustRank, LCRank, CPV and TDR.
Keywords/Search Tags:Web Spam, Trust Propagation, Community Discovery, Differential Propagation
PDF Full Text Request
Related items