Font Size: a A A

Combating Web Spam Based On Both Trust And Distrust Propagation

Posted on:2012-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2218330368988061Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the rapid development of Word Wide Web, search engines become the dominant way for people to find useful information on the Web. Since higher ranking in searching results brings more traffic, and more traffic means more profit to the owners of Web sites. It drives some Web sites owners to manipulate ranking results of search engines through unethical methods. This kind of unethical manipulation is termed as Web spamming. Web spam will not only waste resources of search engines, but also decrease the experience of users. Commercial search engines have to take measures to eliminate the negative effect of spam.Recently, anti-spam algorithms based on trust or distrust propagation is widely used to combat Web spam. Anti-spam algorithms based on trust or distrust propagation is more robust to the attack of spammers and more efficient on computing because of only dealing with page links than that based on contents or heuristic rules. However, existing trust or distrust propagating algorithms all have two serious issues. On one hand, trust/distrust is propagated in non-differential ways, that is, it threats the authorities and the spam pages alike in the propagating process. One the other hand, it has been mentioned that a combined use of good and bad seeds can lead to better results, however, little work has been known to realize this insight successfully.The proposed TDR algorithm in this paper, views that each Web page has both a trustworthy side and an untrustworthy side, and assigns two scores to each Web page:T-Rank, scoring the trustworthiness, and D-Rank, scoring the untrustworthiness. From good and bad seeds, TDR simultaneously propagates T-Rank through links and D-Rank through inverse-links, respectively. In the propagating process, the propagation of T-Rank/D-Rank is penalized by the target's current D-Rank/T-Rank. In this way, propagating both trust and distrust with target differentiation is implemented and the above mentioned two problems are solved. Experimental results on WEBSPAM-UK2007 datasets and ClueWeb09 datasets show that TDR outperforms other typical anti-spam algorithms under various criteria.
Keywords/Search Tags:Web Spam, Trust Propagation, Distrust Propagation
PDF Full Text Request
Related items