Research On Frequent Pattern Mining In A Single Large Graph

Posted on:2015-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:T M Zhang

Full Text:PDF

GTID:2308330482955549

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As a common data structure, graphs are increasingly being used to model scientific data. How to develop an effective and efficient algorithm for mining interesting patterns from the graph database has attracted wide attention. Currently there are two different types of graph data in mining frequent subgraphs, one assumes a database of many small graphs, the other assumes a single large graph. Although the majority of modern applications are modeled as a single large graph, the proposed mining algorithms are not a lot. Moreover, existing algorithms that are guaranteed to find all frequent patterns or algorithms that are heuristic, which tend to miss a large number of frequent patterns. They are computationally expensive and do not scale to large datasets, thus on the basis of the analysis of existing frequent pattern mining work, developing efficient and scalable algorithms that find the complete set of frequent patterns in the single graph setting is the focus of this thesis.Firstly, this thesis makes a detailed analysis of the current frequent subgraphs mining algorithms which are main memory-based techniques available, and then it proposes an isomorphism testing-free approach to extract frequent subgraphs based on the depth-first search strategy.Secondly, with the rapid growth of data, one fundamental difficulty in frequent subgraph mining on a single large graph is the size of the graph. Sometimes, the size of a graph can be too large to load into the memory. Although some disk-based solutions have been brought forward, the cost of access to graph data and I/O generated by the disk are very expensive. There have not been accurate and efficient parallel frequent subgraph mining algorithm in the single graph setting so far, thus frequent subgraph mining on a single large graph based on maximum clique in frequency counting in parallel computing framework algorithm is presented in this thesis.Moreover, to improve efficiency, the algorithm is optimized. When identifying all frequent size-1 subgraphs, the equal size partitioning to split the graph dataset is proposed at the beginning, it assigns each vertex to the partition with the minimum number of edges so that each partition takes the approximately equal number of edges. In the stage of generation task, it is necessary to remove duplicate size-(k+1) subgraphs so as to reduce the amount of communication. Besides, six new pruning methods are added in the stage of frequency counting to avoid maximum clique problem. During the process of mining frequent subgraphs, it customizes new partitioner instead of the default partition method in order to balance the workload.Finally, as the calculation of the maximum clique problem is NP-hard, it is necessary to find a less expensive method. Therefore, this thsis proposes another frequent subgraph mining algorithm on a single large graph based on AMNI in frequency counting in parallel computing framework.In conclusion, this thesis studies the problem of mining frequent subgraphs in a single large graph, and then proposes efficient solutions to the weaknesses of recent algorithms. Extensive experiments validate the efficiency and accuracy of our methods.

Keywords/Search Tags:

frequent subgraph, pattern mining, parallel computing, maximum clique

PDF Full Text Request

Related items

1	Research On Parallel Frequent Graph Pattern Mining
2	Research And Application Of Frequent Subgraph Mining Algorithm
3	Frequent Subgraph Mining In Graph Databases Based On MapReduce
4	Study On Association Rule Mining Algorithm Based On The Pattern Of Polar Clique
5	Research On Graph Frequent Substructure Mining Method
6	Research Of Software Subgraph Pattern Mining Algorithms Based On Software Execution Graph
7	Key Techniques Of Map Database Frequent Pattern Mining
8	The Research On Frequent Subtrees Mining And Corresponding Techniques
9	Research On Location Of Software Defects Based On Frequent API Usage Pattern Mining
10	Frequent Subgraph Mining And Its Application In Compound Property Prediction