Font Size: a A A

Research On Web Spam Detection Algorithm Based Link Weight

Posted on:2020-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:J W ZhouFull Text:PDF
GTID:2428330590983225Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer science,people are increasingly connected with the Internet.Meanwhile,the flooding of the web spam which deceives search engines and affects users' online experience has become a major factor affecting the Internet environment.The concept of web spam is introduced and the common spamming techniques and detection methods of web spam are explored.Spam pages often improve their importance in search engines through content spamming and link spamming.Existing algorithms can be divided into content-based detection algorithms,link-based detection algorithms and others.An improved algorithm is proposed for the link-based spam detection algorithm.Firstly,some shortcomings of the existing algorithms are analyzed.The web pages distribute the scores evenly by the indegree or the outdegree when the scores are propagated.Situations that the spam pages by various methods point to high-rated web pages or are pointed to by high-rated web pages are not effectively processed.In response to these shortcomings,each link is given a certain weight,so that the web page is related to the link weight when the score is spread,and the link with the greater weight will transmit a higher score.The outgoing link spam pages improve the score by linking to a large number of high-score pages,but this behavior will be distinguished and recognized in the proposed algorithm.The principle of incoming link spamming is to spread the scores of the high-score web pages to the low-score web pages.The proposed algorithm makes the scores of high-score web pages decrease when linked to the low-score web pages.Finally,the convergence of the proposed algorithm is proved.Based on the experiments of WEBSPAM-UK2006 and WEBSPAM-UK2007 dataset,PageRank,TrustRank and Trust-Distrust Rank algorithm are compared under different experimental indicators.The results show that the proposed spam detection algorithm can effectively reduce the ranks of the spam pages in all pages,and improve the efficiency of the detection of spam pages.
Keywords/Search Tags:Web spam, Content spamming, Link spamming, Page ranking
PDF Full Text Request
Related items