Font Size: a A A

Research On Algorithms For Dense Subgraph Mining In Large Sparse Graphs

Posted on:2020-10-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:T YuFull Text:PDF
GTID:1360330620452206Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the increment of data volume and the development of user interaction model in the Internet,a big network is generated via the connected metadata.Graph is a widely used data model in database and data mining fields.Various real-world applications are built based on graph data analysis and management methods.The real-world graphs are not only sparse,but also unevenly distributed due to the huge range of vertex degree.In the dense part,the connectivity among vertices is higher than the average ratio between edge number and vertex number of a graph,and the information in it is rich.Thus,it is valuable to mine the dense subgraphs in the real-world commercial applications.According to the type and power of constraints,the dense subgraphs are classified into different groups.In this paper,we focus on the efficiency of mining algorithms of two dense subgraphs with most powerful constraints: maximal cliques and k-edge connected components.Most existing maximal clique enumeration algorithms are based on tree search framework.The common bottleneck of those algorithms is the huge search space dealing with big-degree vertices.Recent researches try to parallelize the maximal clique enumeration algorithms to improve the calculating speed.However,there are two challenges in the paralleling procedure: Firstly,how to choose the appropriate partition of a graph to make the total calculation overhead minimal;Secondly,how to optimize the overall performance between the data redundancy and load balance.In this paper,we research on a new framework of the maximal clique enumeration problem,which is fast,stable and has outstanding scalability.We first proposed a two-stage method named CMC.It not calculate the maximal cliques directly,but first generates a set of candidate cliques which contains all the maximal cliques,then refines the non-maximal cliques via constructing an inverse tire tree.Based on the CMC framework,we designed two optimized strategies to reduce the time and space complexity: the first one using bitset operations instead of set operations,the second one ranking the vertices in a degeneracy order.With the two strategies,the construction stage can be finished in a linear time,and the refinement stage is also very fast.Moreover,we implemented the two-stage algorithm on the MapReduce framework.The parallel implementation is fast and performs well in scalability.The existing k-edge connected component mining algorithms are categorized into two types: exact methods and approximate methods.The exact methods adopt cut-based framework.They iterated search the min-cut and split subgraphs until the min-cut in all subgraphs are no smaller than k,which makes them quite slowly.The approximate methods merge vertices according to the s-t connectivity theorem,which produces inaccurate results.As the search bound is not clear in k-edge connected component mining,the performance of the algorithms can not be improved via parallelization.Thus,the biggest challenge in k-edge connected component mining is to improve the algorithm speed with high precision.In this paper,we proposed a connectivity detection algorithm on local k-ECCs.The detection algorithm executes before merging vertices to check the local connectivity in a k-core.To improve the accuracy of the detection,we proposed a novel exact algorithm to calculate the max s-t flow between two vertices.Based on the local connectivity detection,we also proposed a pruning strategy and a maximal merge strategy to reduce the decomposition iteration of the algorithm.
Keywords/Search Tags:Graph Algorithm, Maximal Clique Enumeration, Parallel Computing, K-Edge Connected Component, Max s-t Flow
PDF Full Text Request
Related items