Font Size: a A A

Link Analysis Based Page Ranking Improvement And Related Link Spam Algorithm

Posted on:2012-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:X F ChenFull Text:PDF
GTID:2178330335497719Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Web and the explosion of the internet information, more and more information floods the Web with uneven qualities. Therefore, how to get the most satisfying results among the sea of information is becoming more and more important, and at the same time, demanding more complicated technique. Nowadays, search engines have become an important source of the page view. With the increase of link spamming, how to filter spam pages and to provide high quality relevant results becomes a great challenge for current web search engines.PageRank and HITS are two most important link-based ranking algorithms which have been used in commercial search engines. However, in PageRank algorithm, PageRank value of one page is evenly distributed to all the pages it links to, and the quality differences between pages are totally ignored during the distribution process. This kind of algorithm is more likely to be attacked by Web Spam or other link spamming. Therefore, in this paper, an improvement of PageRank algorithm is proposed, and is named Page Quality Based PageRank(QPR). Basing on the PR value and link structure of the webs in iterative process, QPR algorithm constantly evaluates the quality of every web page, and distributes PR value to all its citation pages based on the page quality accordingly.Through numerous experiments, it has been proved that QPR algorithm is quite efficient in providing high quality relevant results. However, it performs weakly in the field of filtering spam pages. A large number of studies now show that spam pages collude together with each other. So to analyze the link structural characteristics of spam pages has become an important way to understand and to filter spam pages. Therefore, we assume that the link structures of spam pages have a lot in common. Based on this assumption, we present a link-based filtering method for web spam. We first cluster all the pages according to their similarity of link structure, then to down-weight these links accordingly in order to attain the goal of filtering the spam pages.We conduct lots of experiments on many data sets, and have already proved that QPR algorithm is very effective on improving the quality of search result as well as filtering spam pages through the link-based filtering method.
Keywords/Search Tags:Link Analysis, Web Spam, Web Data Mining, Page Rank, WWW
PDF Full Text Request
Related items