Font Size: a A A

Research And Implement On Hiden Web Spam Detection Technology

Posted on:2013-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhouFull Text:PDF
GTID:2248330395953373Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and popularization of Internet, Web information has arisen explosive growth.Search engines have become the indispensable tool for people to get the required information from the vast amounts of information. But in the search results, The top few is not really needed by users.It is the spamer using black hat SEO techniques created the web spam that are not related to the searching,even can fraud users. This wasted the time of users and lowered the reputation of search engine company. In Web Spam, the feature of hidden web spam is concealed, deceptive and difficult to detect. Hidden web spam has become a perilous problem to be solved in the field of spam page detection.This article reviewed the present detection technology of hidden spam pages at home and abroad. It detailedly introduced the technical characteristics and types of hidden spam pages, and focused on the implement of redirection technology. It summarized a variety of phenomenon of redirection and offered a detailed analysis of the characteristics and the causes of these phenomena. It also detailedly introduced several typical detection technology of cloaking and redirection that proposed by domestic and foreign scholars.Basing on the redirection phenomenon which has been summarized, this article offered a detection algorithm of redirection and designed a framework for detecting redirection of search results. It has achieved a amework, which can effectively detect the redirection of the search results.This article built a Chinese junk-lexicon and a Chinese sample data set of detecting redirection. It carried out a detailed analysis of experimental results from the aspects of confusion matrix, keywords, the type of camouflage web page, the type of redirection and spam site.
Keywords/Search Tags:Web Spam, Hidden Web Spam, Cloaking, Redirection
PDF Full Text Request
Related items