Font Size: a A A

Research On Web Spam Detection Technology Based On Immune Clonal Selection

Posted on:2015-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2268330428476742Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Web spam refers to those Web pages which mislead search engines through improper means to get higher ranking than they deserve, so they may get more access, but the quality of the Web did not improve. Web spam damage the fairness of the ranking of search engine and the experience for user, moreover it brings serious security problem to the Web. So, how to detect Web spam effectively and how to guarantee the legitimate rights and interests of the user, that have become one of the top challenges for search engine. So the research of Web spam detection technology is very important and significant.In this paper, the principle of search engine to rank web page is introduced, and the types and characteristics of cheating technology of Web spam is analyzed, the research status of Web spam detection is described also. Then, the paper introduces the principle of artificial immune system and some common algorithms based on artificial immune system. Emphasizes the classification based on immune clonal selection algorithm, which is a new machine learning method and very effective in solving classification. In this paper, the immune clonal selection is adopted to detect Web spam, which provides a new method for Web spam detection.Immune clonal selection algorithm has the ability of self-learning, adaptive and distinguishes between self and non-self. This article uses the clonal selection algorithm to detect Web spam and a framework of Web spam detection system based on this algorithm is designed. Also, the system uses the feature selection strategy to remove redundant and ineffective features to improve the detection efficiency and practicability. By experiment study on the public dataset WEBSPAM-UK2006, the defect of the algorithm is analyzed. To solve this problem, this paper uses the improved immune clonal selection algorithm which adds antibody restrain to control the scale of different classes of antibody, by experiment study on the dataset shows that the improved algorithm can detect Web spam efficiently even the dataset is unbalance.Finally, this paper uses the Bagging method to build an integrated classifier use the improved immune clonal selection algorithm. By experiment study on dataset, the experiment results reveal that the integrated classifier can identify Web spam more effective than just use only the improved immune clonal selection algorithm.
Keywords/Search Tags:Search engine cheat, Web Spam, Artificial immune system, Clonal selection, Classification, Ensemble Learning
PDF Full Text Request
Related items