Font Size: a A A

Research On Algorithms For Detecting Web Link Spam

Posted on:2013-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z H XuFull Text:PDF
GTID:2248330371996154Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid popularization of Internet, the amount of Web Spamming is increasing. It has greatly affected the accuracy and efficiency of the search engine. How to identify web spam has become one of the most serious challenges which the Internet search faces.By further researching on the web spam, we found that most web spam use the links. This thesis focuses on link spam research, so as to design and implemente a web spam detection system. Based on the survey of the link Spam detection technologies. Web Spam detection system framework is designed, in which the attributes are analyzed and design of a spam classifier is studied.Firstly, a random forest algorithm classifier based on web links spam is studied and optimized in the thesis. In addition, the dataset is classified for the first stage after the attributes comprehensive extracting based on content and link for the web page."Link Farm" is the common form of link spam. SpamRank algorithm is modified for link detection in the thesis. By given SpamRank weights to the web spam seed, it transferred the SpamRank value between the web spam and the linked page with each other, constructed the web graph and traverse, the dataset is classified for the second stage. Then, the detection result is analyzed by using the IN-OUT algorithm in the thesis.In the last part, the classifier is trained by using the dataset WEBSPAM-UK2007which is launched by the Web Spam Challenge2008, and the link spam detection algorithm is experimented in the thesis. Then, the experiment result is analyzed and compared in detail with various evaluation targets. The experiment result indicates that the system implemented in this thesis have achieved the expected aim.
Keywords/Search Tags:Detecting Web Link Spam, Random Forest Algorithm, DetectingLink Farm, SpamRank Algorithm
PDF Full Text Request
Related items