Font Size: a A A

The Parallel-based For Web Graph Compression Representation Algorithm

Posted on:2015-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ZhaoFull Text:PDF
GTID:2268330425488828Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Web graph representation is a kind of graph which consists of all the Web pages with the hyperlinks between them. It can be used to extract relevant information from the study of those links between Web pages and commonly used as the basis for crawling algorithm, searching and community discovery. But the main problem of solving some problems using Web graph representation is the size of the graphs. As they represent the whole Web, which contains billions of Web pages, Web graphs are large and their plain representation cannot be completely stored in the current main memories. The method of Web graph compression presentation is mainly studied in this article. On the basic of k2-tree algorithm, an improved layer-based method is presented. The limitation of matrix partition is improved. Meanwhile, this article uses the MapReduce programming model to parallelize the algorithm.Firstly, introducing several compressed representations and statistical property of Web graphs in details. Aimed at the k2-tree compression presentation algorithm, the influence that matrix segmentation for the space and query time is analyzed, combining with the distribution property of the Web graph, a Layer-based k2-tree compression algorithm is analyzed. The original algorithm about matrix segmentation on the upper structure and leaf node is improved. A better balance is obtained on the space and query time.Secondly, Based on the above algorithm, this paper uses the MapReduce programming model to parallelize algorithm, for the large-scale graph data, the algorithm can be constructed in real time.Finally, through the experiment on different open dataset, validated the layer-based method, we get a better query time that guaranteed space compression rate. Meanwhile we do some experiment on multicore compute and the best speedup ratio is9.7.
Keywords/Search Tags:Web graph, compact, MapReduce, parallel compute
PDF Full Text Request
Related items