Font Size: a A A

Compression And Query Algorithms For Large-scale Graphs

Posted on:2021-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhaoFull Text:PDF
GTID:2480306050970519Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the growing scale of graphs,traditional graph representations can not adapt to memory,resulting in inefficient operation of classic graph algorithms,which poses a huge challenge to the analysis and mining on graphs.Especially for storage-constrained devices,processing large graph is more difficult.The graph compression algorithm reduces the storage space required by the graph by constructing the compressed representation of the graph,thereby making the graph adapt to memory,and thereby ensuring the efficient operation of the graph algorithm,which is the key technology to meet the above challenges.Thus,studying the graph compression algorithm is of great significance to solve a series of problems such as the large storage space required for graphs caused by the surge in the amount of graph data,the low operation efficiency of graph algorithms,and the difficulty of analysis and mining based on graph algorithms.Given the graph G=(V,E),use n=|V|,m=|E| to denote the number of vertices and edges of G,respectively.This thesis focuses on the compression and query algorithms for real-world graphs.Real-world graphs are sparse graphs modeled by entities in the real world and their relationships,satisfying m=O(n).This thesis first proposes a graph compression algorithm named RCPA(Reorder and Compress Position Array)that supports fast Out-Neighbor query.The RCPA constructs the two-level compressed structure to store the absolute position of Is in the adjacency matrix,adopts the hybrid encoding strategy to reduce the compression ratio and designs the decoding fast table to speed up the query.The RCPA can represent G in 2n+nlogn+o(n)bits using O(n2)time and answer the Out-Neighbor query in O(logn+dmax)time,where dmax is the maximum degree of nodes.In addition,this thesis gives the greedy algorithm for reordering nodes in G,which can change the data distribution of the adjacency matrix so that the Is in the adjacency matrix is clustered in the upper and left parts of the matrix and enable efficient compression on G.To support more complex graph mining tasks,a graph compression algorithm named RCSM(Reorder and Compress Sub-Matrix)that supports both In-Neighbor and Out-Neighbor queries is also proposed in this thesis.The RCSM divides the adjacency matrix into sub-squares and sequentially compresses each non-zero sub-squares in a row-major order.For each non-zero sub-matrix,record the sub-matrix number and convert the matrix number to the matrix number in column-major order to support the In-Neighbor query.The RCSM can represent G in(?)time and answer the Out-Neighbor query and In-Neighbor query in(?)time.The experimental results of nodes reordering algorithm on the experimental data show that the nodes reordering algorithm proposed in this thesis plays a positive role in the two graph compression algorithms proposed in this thesis,reducing the compression ratio and shortening query time of the compression algorithm on the experimental data.Comparision experiments of the two compression algorithms with state-of-the-art on experimental data show that in the case of only supporting the Out-Neighbor query,considering the compression ratio and query time comprehensively,the overall performance of RCPA algorithm is optimal;when it is necessary to support both In-Neighbor query and Out-Neighbor query,the RCSM algorithm has the best compression performance among the compared algorithms,the query speed is faster,and it has certain advantages.
Keywords/Search Tags:Graph compression, Compressed represention, Nodes reordering, In-Neighbor query, Out-Neighbor query
PDF Full Text Request
Related items