Font Size: a A A

Research On Frequent Pattern Mining In A Single Large Graph

Posted on:2015-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:T M ZhangFull Text:PDF
GTID:2308330482955549Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a common data structure, graphs are increasingly being used to model scientific data. How to develop an effective and efficient algorithm for mining interesting patterns from the graph database has attracted wide attention. Currently there are two different types of graph data in mining frequent subgraphs, one assumes a database of many small graphs, the other assumes a single large graph. Although the majority of modern applications are modeled as a single large graph, the proposed mining algorithms are not a lot. Moreover, existing algorithms that are guaranteed to find all frequent patterns or algorithms that are heuristic, which tend to miss a large number of frequent patterns. They are computationally expensive and do not scale to large datasets, thus on the basis of the analysis of existing frequent pattern mining work, developing efficient and scalable algorithms that find the complete set of frequent patterns in the single graph setting is the focus of this thesis.Firstly, this thesis makes a detailed analysis of the current frequent subgraphs mining algorithms which are main memory-based techniques available, and then it proposes an isomorphism testing-free approach to extract frequent subgraphs based on the depth-first search strategy.Secondly, with the rapid growth of data, one fundamental difficulty in frequent subgraph mining on a single large graph is the size of the graph. Sometimes, the size of a graph can be too large to load into the memory. Although some disk-based solutions have been brought forward, the cost of access to graph data and I/O generated by the disk are very expensive. There have not been accurate and efficient parallel frequent subgraph mining algorithm in the single graph setting so far, thus frequent subgraph mining on a single large graph based on maximum clique in frequency counting in parallel computing framework algorithm is presented in this thesis.Moreover, to improve efficiency, the algorithm is optimized. When identifying all frequent size-1 subgraphs, the equal size partitioning to split the graph dataset is proposed at the beginning, it assigns each vertex to the partition with the minimum number of edges so that each partition takes the approximately equal number of edges. In the stage of generation task, it is necessary to remove duplicate size-(k+1) subgraphs so as to reduce the amount of communication. Besides, six new pruning methods are added in the stage of frequency counting to avoid maximum clique problem. During the process of mining frequent subgraphs, it customizes new partitioner instead of the default partition method in order to balance the workload.Finally, as the calculation of the maximum clique problem is NP-hard, it is necessary to find a less expensive method. Therefore, this thsis proposes another frequent subgraph mining algorithm on a single large graph based on AMNI in frequency counting in parallel computing framework.In conclusion, this thesis studies the problem of mining frequent subgraphs in a single large graph, and then proposes efficient solutions to the weaknesses of recent algorithms. Extensive experiments validate the efficiency and accuracy of our methods.
Keywords/Search Tags:frequent subgraph, pattern mining, parallel computing, maximum clique
PDF Full Text Request
Related items