Font Size: a A A

The Representation And Compression Of Web Graph

Posted on:2015-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:M H ChaiFull Text:PDF
GTID:2298330431465768Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web Graph is an abstract of the relationship between pages on the Internet, itsnodes are pages and directed edges are hyperlinks. As there are a large number of pageson the Internet, it should take much space to store the Web Graph, and when we areperforming related calculations on them, limited memory will severely limits theefficiency of the algorithm, which has led the compression of Web Graph to be one ofthe hottest issues on the study in recent years.By studying several characteristics of Web Graph, a project to compress WebGraph based on those characteristics has been proposed, which could get the tradeoff incompression and random access time by setting different parameters. To improve thecompression, many new ideas and algorithms have been proposed in this project, suchas Golomb like encoding, Run_length&Golomb like encoding and so on, as well assuccessfully controlled the auxiliary data file of Huffman encoding in a small range. Byanalyzing the processes of querying and uncompressing based on compressed WebGraph, according to the problem of reference chain is too long when querying, analgorithm to shape and optimize the reference chain have been proposed, whichsignificantly reduce the time of querying nodes, and according to the locality principlewhen uncompressing Web Graph, Cache accelerated algorithm have been proposed inthis project, which also greatly accelerated the speed of uncompress Web Graph.This paper use several famous Web Graph data to test the compression and randomaccess time, and all of which achieved ideal results, and only need about1.7bits for peredge, even1.1bits for some graphs. By setting a similar set of parameters, this paperpresents the comparation between BV algorithm and this project. The result shows thatthe effect of this project is slightly better than BV algorithm on the compression, buttakes a little longer time on random access.
Keywords/Search Tags:Web Graph, Compression, BV Algorithm, WebGraph Framework
PDF Full Text Request
Related items