Font Size: a A A

Research On The Web Graph Compression Based On The BV Algorithm

Posted on:2015-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:F ShiFull Text:PDF
GTID:2298330431965755Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The web pages and hyperlinks among them can be modeled as a huge directedgraph, which is called the Web Graph. The analysis of the Web Graph can be used torank pages, fight web spam, detect communities and mirror sites, etc. However, thisstudy is prevented by the need to store huge graphs in the external memory, whichhampers efficient random access to nodes. Therefore, the study of the Web Graphcompression stored in the memory is of great significance.Web Graph has the locality and the similarity, and possesses the power-lawdistribution. The BV algorithm is one of the most efficient compression schemes forthe Web Graph, and applies Zeta-k coding. This paper presents two improved WebGraph compression algorithms based on the BV method: the approximate optimalreference list compression algorithm and the merging optimal reference listcompression algorithm. They both apply the reference compression technology, andthe key is to select the optimal reference list. Differently, the first is no longer limitedin the sliding window to select the optimal adjacency list, and recalls along thereference chain to search approximate global optimal reference list; the second is bycombining the fixed number of nodes to generate a block node, which is used as theoptimal reference list.This article selects five test sets, which are used in testing three compressionalgorithms. The main test content includes two basis points: bits needed for storage ofeach link and the time required for random access to every node. Finally, throughcomparing and analyzing the experimental results of three algorithms, it’s proved thatimproved methods advance the compression effect, and cut the length of the referencechains, which increases the efficiency of the random access to node.
Keywords/Search Tags:Web Graph Compression, Power-law Distribution, Zeta-k Coding, BV Algorithm
PDF Full Text Request
Related items