Font Size: a A A

Research On Query Preserving Oriented Graph Aggregation Algorithm

Posted on:2021-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:R BingFull Text:PDF
GTID:2428330623482029Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Graph data is often used to describe the relationship between entities in real life.In graph data,nodes represent entities,while edges represent relationships between entities,such as relationships between users in social networks,roads in traffic networks,web pages in Web graphs,etc.With the development of these fields and the increase of data,the scale of graph data is increasing,it is impossible to process and analyze these graph data directly by naked eye.In order to save storage space and facilitate analysis and query of graph data,large-scale graphs need to be compressed to facilitate visualization and analysis of graph data.Therefore,graph aggregation technology has become a research hotspot.The main purpose of query-oriented graph aggregation is to keep the query information in the original graph,such as reachability relationship,neighborhood relationship,node distance,etc.Through the summary and analysis of the current situation of graph aggregation technology,this thesis focuses on two aspects: distance query-oriented graph aggregation and reachability protection graph aggregation,and achieves the following research results:1.We propose the weighted graph aggregation algorithm via similarity of both structure and attribute: In view of the fact that the existing algorithm does not consider the graph data with node attributes and edge weights at the same time,a weighted graph aggregation algorithm combining structure and attribute similarity is proposed.In this method,firstly,a pruning strategy is used to quickly judge the structural similarity of the closed neighborhood between nodes.Then we remove the node pairs with different structure.And we calculate the structural similarity of the remaining node pairs.Secondly,the minimum hash technology is used to calculate the attribute similarity between node pairs with similar structure.Thirdly,the joint similarity between node pairs is obtained by combining the structural similarity and attribute similarity between node pairs.Finally,the nodes with high joint similarity are selected to merge,and the edge weight value between new nodes after aggregation is calculated.2.We propose the distance query-oriented attribute weighted graph aggregation algorithm: For the task of distance query between nodes on aggregation graph,an attribute weighted graph aggregation algorithm for distance query is proposed,which can be used to query and store weighted graph and unauthorized graph,and the clustering algorithm has the least impact on the distance between nodes.Specifically,when two nodes are merged into a super point,a set of equations is introduced to minimize the distance difference caused by the merger.The basic principle is to make the average error equal to zero.Firstly,pruning strategy is used to filter out node pairs with different structures and calculate the exact structural similarity between the remaining node pairs.Secondly,attribute entropy is used to measure the consistency of the internal attributes of the super point;then,according to the obtained structural similarity value and attribute entropy value,the quality score between node pairs is calculated to determine whether the node pair is combined.Finally,through the simultaneous equations and solving the equations,the weight value of the new edge is given to minimize the influence of the new edge weight on the distance between nodes.3.We propose the highly compact reachability preserving graph compression with corrections: For the task of performing reachability query on aggregation graph,a highly compact aggregation of reachability protection graph with modified table is proposed.First of all,the nodes in the original graph are divided into equivalent classes by the reachability equivalence relationship.Then,each pair of equivalent classes is further compressed according to the similarity of its predecessor and successor nodes.In addition,a set of modifications is used to maintain the reachability of the nodes in the original graph.This method generates a small aggregate graph,which only keeps the information related to the reachability query,but not the whole original graph,so as to obtain a better compression ratio.Any algorithm used to evaluate the reachability query can be directly executed on the aggregation graph generated by this method without decompressing the aggregation graph.In addition,in order to deal with the dynamic changes of the original graph,this thesis also proposes an incremental reachability preserving compression algorithm,which updates the compressed graph and the modified table,so that we can update the original graph without decompressing once to maintain the reachability of the original graph.
Keywords/Search Tags:Graph aggregation, Graph query, Similarity calculation, Entropy, Minhash
PDF Full Text Request
Related items