Font Size: a A A

Research On The Method Of Detecting And Grading Web Spam Using Web Quality Features

Posted on:2014-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:F L LiFull Text:PDF
GTID:2248330398475243Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the explosive development of the Internet, search engine has already become one of the most frequently used Web Applications by people in order to gather information. The appearance of Web spam not only has reduced the efficiency and reputation of search engines, but also may lead the customers to suffer from hostile attack or economic loss, meanwhile accompanied by making those legal websites lose large amount of clients and their interests hugely impaired. Hence, the question how to effectively detect Web spam to guarantee the information security has become a huge challenge faced by search engines on websites.The detection of Web spam is actually a problem of classification. The classical detection methods aims to classify those unmarked websites using classification models generated by drawing the features of different websites. There exists two weak points in this process. Firstly, merely content features and link features are considered when picked up from different websites, quality features not included. Secondly, the outcoming of classification only presents whether the website is a Web spam without any grading of its extent of damage. Lots of research has shown that websites with higher authority usually enjoy a relatively better quality of their web pages while spam pages are quite poor in quality. Besides, grading based on the level of content harm may contribute to search engines when making a more proper filtering policy. Therefore, this essay grades the detected spam pages according to the level of damage made by their content, comprehensively taking the content features, link features as well as quality features into consideration, and finally designs a Web spam detecting and grading system.In the end, this paper designs several contrast experiment, providing confirmation for this Web spam detection system using WEBSPAM-UK2007webpage sample collection and Chinese webpage sample collection. The result shows that the classification detection system put forward in this essay can achieve satisfactory results.
Keywords/Search Tags:Web Spam Detection, Web Quality Feature, Detecting and Grading Method, Classification Algorithms, Adaboost Algorithms
PDF Full Text Request
Related items