Font Size: a A A

Research On Key Technologies Of Parallel Graph Summarization

Posted on:2021-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z P YangFull Text:PDF
GTID:2518306305472684Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet,the size of the user group is increasing day by day,which generates a large amount of data and leading us into the era of big data.Most of the data generated on Internet is modeled by graphs.Due to the huge scale of graphs,now the traditional centralized methods can no longer process these graphs within a reasonable time,and people gradually use distributed computing platforms to process big graphs in parallel.The processing of graphs in computer field is called graph computing.As an important part of graph computing,graph summarization is an important technique of big data analysis and processing,which is a kind of technologies that summarizes large-scale graphs into more concise representations and can reduce the scale and complexity of these graphs.Graph summarization has a wide range of applications,such as social network analysis.graph visualization and so on,and it promotes other techniques for analyzing and processing large-scale graphs and has important research significance and application value.Most existing methods of graph summarization are centralized,and ignore the node attributes and relationships between nodes in graphs,which has accuracy and performance defects when dealing with large-scale attributed graphs in the real world.This paper improves the shortcomings of them,and proposes a parallel graph summarization algorithm for attributed graphs based on node aggregation,which fully considers the attributes and relationships during the process of graph summarization.The algorithm includes a calculation method suitable for the node merge error increment of attributed graphs,and uses a heuristic method to dynamically determine the node merge error threshold in each super-step.Only the node pairs that meet the merge error threshold will be merged,which makes the graph summarization error as small as possible and ensures the quality of summaries generated.In this paper,our parallel graph summarization algorithm is implemented on the distributed computing platform Spark,which uses a variety of real and synthetic graph datasets for simulation experiments,and selects multiple evaluation indicators to experimentally verify the effectiveness and scalability of it.In addition,this paper performs summary-based graph queries,and compares with the original query results to verify the acceleration effect of graph summarization.Experimental results show that our proposed parallel graph summarization algorithm can efficiently generate high-quality summaries,and has good scalability and significant acceleration effect on graph queries.
Keywords/Search Tags:Big Data, Distributed Graph Computing, Graph Summarization, Attributed Graph
PDF Full Text Request
Related items