| Web pages enhance their credibility through mutual links.Some web spam uses malicious deception to enhance their credibility,which would destroy users experience,bring huge economic losses to search engines and other legitimate websites,and pollute the Internet environment.Several common web spam cheating methods are introduced,and the corresponding web spam detection methods are explored.Web spam use content cheating and link cheating to increase their credibility.The detection of web spam can be divided into link-based detection algorithms,content-based detection algorithms,and others.For link-based detection algorithms,this paper proposes an improved algorithm based on the link relationship and topic relevance of the web pages.First,analyze the shortcomings of the proposed link-based algorithm,and find that the existing algorithm treats all links "equally" when transferring scores,and fails to effectively detect the existence of cheating methods such as link farms and honeypots.In response to the above problems,the algorithm first uses the topics of web pages are obtained through LDA topic model,and link weight is appropriately adjusted according to the credibility of the link object and the topic relevance between the two.According to different link situations,1)low-scoring webpages actively linking to high-scoring webpages resulting in lower scores for high-scoring webpages;2)high-scoring webpages actively linking to low-scoring webpages,"endorsing" low-scoring webpages,compare the topic relevance of the two linked webpages,then adjust the score transfer.Finally,the evaluation algorithm proposed in this article is verified on the public data set WEBSPAM-UK2007.Through different evaluation indicators,the proposed algorithm in this article is compared with the Page Rank and Trust Rank algorithms.The experimental results show that the algorithm proposed in this article can effectively downgrade web spam,thereby inhibiting cheating on spam pages. |