Font Size: a A A

Web Spam Detection Based On Combination Of Multiple Classifiers

Posted on:2011-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:T W ZhangFull Text:PDF
GTID:2178330332471377Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Web spam is designed for search engines rather than for users, refers to all deceptive actions which try to increase the ranking of a page in search engines. In recent years, the web spam techniques have become increasingly rampant, which make the results from search engines be greatly harmed. Identifying and preventing spam is deemed as one of the top challenges for web search engines. Developing efficient web spam detection algorithms is a promising research area.Techniques for web spam can be classified into content spam, link spam, hiding spam, etc. Most of the existing heuristic web spam detection methods have focused on specific spam, which are Tnot only difficult to Ttune theT parameters and could easily be exploited by cheaters.In this paper wo focused on the characteristics of Web spam. Around the feature selection and classifier design, etc. the following research work has been carried out.(1) We study the theory and application of relevant technologies of multiple classifier, including design criteria and application integration of multi-classifier, the architecture and optimization method of multi-classifier and the methods of multi-classifier ensemble.These three design criteria (classification accuracy of group members, members classifier diversity and the efficiency of multi-classifier system) is a primary consideration when designing the system;(2) analyze the characteristics of web spam, for the diversity of web spam we proposed the method of multiple classifier combination based on feature selection, and with several combinations of commonly used classification methods were compared;(3) Combination of multiple identical component classifier (input and output are exactly the same) will not be any help on the performance. Therefore we add the step of individual classifier diversity measure. Further improve the performance of classifier combination.By comparing the experimental results can show that our proposed method is effective and feasible。...
Keywords/Search Tags:Web spam, Multiple classifiers integration, Feature selection, Diversity measures, Feature segmentation
PDF Full Text Request
Related items