Font Size: a A A

Research On An Filtering Algoerithm For Web Spam In Search Engines

Posted on:2014-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:T HeFull Text:PDF
GTID:2428330488999539Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the current growth of internet scale,the search engine is an important tool to obtain information from the internet.However,in the development of high,there are insufficient in search engines.Most people do not know the Web spam as much as more in China.Baidu,Google and other search engines have been plagued by Web spam.According to my understanding,Web spam may account for 50%of China.Although the decline in the proportion,however,the number of Web spam continues to increase.The purpose of Web spam does not provide valuable information for visitors and cheat the search engine to obtain benefits.Because they do not consider the quality of web pages,but take a variety of ways to make web pages to achieve higher rankings.Web spam not only becomes a threat to the justice of search engine sorting,but also seriously affects the users' search experience,and also waste time of users.It can reduce the reputation of search engine company,so we need to solve a problem with how can we distinguish web pages with high quality and Web spam.The main problem is page of sorting in Web spam.The main algorithm in search engine is PageRank.In PageRank,Pr values of Pages are evenly distributed to the page it points.But in the internet,the quality of web is different,so we need to propose a method to detect these Web spam.This paper aiming at the research of proposes a new algorithm for Web spam.The main work is as follows:Firstly,it must eliminate the Web spam's impact on the result in order to improve the quality of search results.The PageRank algorithm is vulnerable to interference of Web spam.It reduces the accuracy of the search results.We design an algorithm to detect the Web spam through technology of spam.First generate the HTML tag tree,and then traverse it using the depth-first method.According to the contents of the first two data fields to determine whether the page is Web spam.Secondly,we research the PageRank algorithm;it does not transfer the low value to Web spam.It is unfair to the search results,we modify the PageRank through evaluate different Web spam,and transfer the low value to Web spam.
Keywords/Search Tags:Web spam, PageRank, Search Engine
PDF Full Text Request
Related items