Font Size: a A A

Research On Dense Subgraph Discovery Method Of Large-scale Graph Data In Distributed Environment

Posted on:2020-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:R R LiFull Text:PDF
GTID:2370330578957380Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the application of the Internet and the continuous progress of science and technology,data from all walks of life are accumulating on an unprecedented scale.As an important data structure to describe data,graph is widely used in big data research by scholars,graph data mining has become an important research topic in academia.In large scale graphs,the dense parts of the graph are often the important parts of the graph data,therefore,the discovery of dense subgraphs in large-scale graph data has become a hot topic.Moreover,dense subgraph problem can be widely used in frequent subgraph problem,community discovery and other problems,which has important research significance.Discovering the dense subgraph problem in graph data is a well-known NP-hard problem.At present,some scholars have designed different methods to discover dense subgraphs,but there are some problems,such as inapplicable to processing large-scale graph data,the dense subgraphs found are disconnected,dense subgraphs are local optimal solutions,the algorithm is not suitable for finding dense subgraphs in some type of graph data,and so on.In order to solve the above problems,this paper designs a dense subgraph discovery algorithm suitable for processing large-scale graph data in different application backgrounds,taking advantage of the powerful storage and computing advantages of distributed Hadoop platform.The dense subgraph discovery algorithm in this paper implements the storage of large data sets through HDFS and the computation of large data sets through MapReduce.Firstly,the algorithm preprocesses the graph to eliminate the influence of parallel edges and rings on dense subgraph discovery.Secondly,in order to quickly discover dense subgraphs in large graphs,the graph pruning strategy is used to quickly remove the vertices that must not exist in dense subgraphs,and then extract the subgraphs composed of the vertices contained only in triangles in the remaining graphs.Finally,the initial subgraph is selected and updated continuously through two stages.The first stage:in each iteration,a vertex is selected from the neighbor node set of the vertices in the current graph,which maximizes the subgraph density after adding the vertex until the neighbor node set of the vertices in the current graph no longer has a vertex and terminates the iteration when the subgraph density increases.The second stage:in each iteration,a vertex is selected from the current subgraph to maximize the density of the subgraph after removing the vertex until the iteration is terminated when the density of the subgraph does not increase.In this thesis,the subgraph at this time is regarded as a dense subgraph discovered by the algorithm,and the advantage of Spark platform is utilized to analyze the connectivity in the graph with elastic distributed data set RDD.Experiments show that the dense subgraph discovery algorithm proposed in this thesis achieves good results in processing large-scale graph data in running time,density of dense subgraphs and connectivity of dense subgraphs,and has certain advantages compared with other algorithms.
Keywords/Search Tags:Distributed environment, Dense subgraph, Large-scale graph data, Connectivity
PDF Full Text Request
Related items