Font Size: a A A

Research Of Paralleled Frequent Subgragh Mining Algorithm PG-Miner Based On Claster Environment

Posted on:2011-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:P F ChenFull Text:PDF
GTID:2178360305465282Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The study of mining frequent patterns in large database has been focused all along in data mining fields. And frequent subgraph mining aims at resolving the structural pattern mining issues, such as chemistry, biology, WWW applications and other related problems. Because of the large scale and complex features of the graph abstracted from practical applications, the researches for high-performance frequent sub-graph mining algorithms have been a research issue. The main content of this thesis is proposed a parallel frequent itemsets mining algorithm:HPFP-Miner, then describes a parallel frequent subgraphs mining algorithm:PG-Miner. The main contents can be stated as follow.1. Parallel frequent itemsets mining algorithm:HPFP-Miner.In this section we presented the HPFP-Miner algorithm. In order to generate complete and non-redundant result sets, in this algorithm we proposed two parallel environment-based data structures HPFP-trees and HPFP-Forest which were used to compress and store the information of the frequent itemsets. In contrast to FP tree, the root of HPFP tree is not a "null" but an item to identify the specific tree, and HPFP tree's indexes are from children to parent. We linked all of the HPFP trees with a pointer list to constitute a HPFP-Forest. By doing so, the master node can dispatched HPFP tree to distributed processing nodes during execution phase, which make the algorithm more flexible and easier to control the load balance of all processing nodes.2. Parallel frequent subgraphs mining algorithm:PG-Miner.In this section, we proposed a new parallel frequent subgraphs mining algorithm PG-Miner. The algorithm dispatch the most complexity of sub-graph isomorphism to different processing nodes, which made the algorithm's execution time decrease considerably. The main strategy of the algorithm contains two steps. Firstly, we divided the mining process into two parts. One part is generate the frequent sub-tree sets by the master node, another part dispatches sub-tree sets to different processing nodes. Secondly, we processed the edge expansion and frequent sub-graph isomorphism which are the most complexity parts of frequent subgraphs mining in parallel. After algorithm analysis,we concluded that our algorithm's time complexity is o (2n* n2/k), where n is the number of frequent edges in graph sets, and k is the paralleled processing node number (total number of nodes is k+1). Finally, we validated the feasibility of the algorithm in different real and synthetic data sets.
Keywords/Search Tags:data mining, association rules mining, frequent subgraphs mining, parallel association rules mining
PDF Full Text Request
Related items