Font Size: a A A

The Sorting Of Web Pagerank Algorithm And Hits Algorithm Research

Posted on:2013-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y XieFull Text:PDF
GTID:2248330374485495Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In this paper, we concern about the web searching problem. Algorithms such as Kleinberg’s HITS algorithm and the PageRank algorithm of Brin and Page use the link structure of a network of web pages to assign weights to each page in the network. This paper aims at the modification of these two algorithms. For the PageRank algorithm, this paper partitions the web pages into three classes, and exploits a simpler block structure of the web. For the HITS algorithm, this paper focuses on the role of co-citation in defining authorities. Then, we present a new hyperlink weighting scheme. The new model combines both features of HITS and PageRank. The main results are as follows:1. The first chapter introduces the source of the problem and the significance of the web searching in our life. The two most popular ranking algorithms (PageRank and HITS) are given in the form of linear matrix equality. We also introduce some important methods to show significant improvements over the two algorithms respectively.2. We propose two contributions for the modification of PageRank. First, we observe that the nested block structure of the web has some nice properties. Thus, we partition the web pages into three classes, and further exploit a simpler block structure of the web. The web pages are usually separated into two classes:dangling nodes and nondangling nodes, while this paper partitions them into three classes:dangling nodes, common nodes, and general nodes. The reordered hyperlink matrix appears to be much simpler. Second, we propose an algorithm and reduce the computation of the PageRank vector to that of solving a series of much smaller systems separately in each iterration. We present the results of some experiments, showing that this algorithm speeds up the computation of PageRank, and the efficiency of this algorithm increases as the number of common nodes decreases. It also shows that in practice, both algorithms are guaranteed to outperform the original algorithm, i.e., Algorithm1, as long as an appropriate block structure of web exists.3. For the HITS algorithm, we emphasize the role of co-citation in defining authorities. Then, we observe that in the HITS algorithm, if two distinct web page i and j are co-cited by many other webpages, then i, j are likely to be related in some sense or have certain commonality. According to this close relationship in HITS, we come to the conclusion that the higher the number of webpages co-cites webpages i and j is, the stronger the relevancy between the two pages. The page j with stronger relevancy should obtain more values from the page i. Therefore, we propose a new hyperlink weighting scheme to describe the strength of the relevancy between any two webpages. Then we combine hyperlink weight normalization and random surfing schemes as used in PageRank to justify the new model. In the new model (MBCC), the pages with stronger relevancy are assigned higher values, not just depend on outlinks. This model combines both features of HITS and PageRank. Finally, we present the results of some numerical experiments, showing that the MBCC ranking agree with the HITS ranking, especially in top20. Meanwhile, MBCC keeps the superiority of PageRank, i.e., existence and uniqueness of ranking vectors.
Keywords/Search Tags:PageRank, common nodes, HITS, co-citation, hyperlink weightingscheme
PDF Full Text Request
Related items