Font Size: a A A

Internet Noisy Link Identification & Filtering And Application On Anti Web Spam Research

Posted on:2010-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:J B ChuFull Text:PDF
GTID:2178360275491820Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays,the link-based algorithms for sorting web pages occupy a crucial role in the work of search engine.Such algorithms use the "link as voting" hypothesis as the prerequisite.But with the development of the Internet for more than 10 years,this assumption is not a panacea.And web pages are no longer simply "voting" each other. With the existence of a variety of other links(i.e.noisy links),the accuracy of link-based sorting algorithms has been reduced.How to identify and deal with these noisy links is one of the hot spots in the foreign research area.In this paper,a solely links-based method is proposed to identify and filter noisy links automatically,and we use detailed experiments to verify our approach.The results show that we can identify and filter the noisy links effectively and improve the ranking considerably.P@20(the number of relevant results of top 20) is increased from an average of 11.8 to 16.4.Then,we further apply this method in the study of Web spam.Through the experimental verification of foreign published common data sets,we succeed in filtering out the majority of spam sites.Compared to some well-known algorithms, our approach is also very competitive.Thereby the method of identification and filtering noisy links is verified in the application of anti Web spam study.
Keywords/Search Tags:Search Engine, Sorting, Noisy Link, Web Spam, WWW
PDF Full Text Request
Related items