Research Of Paralleled Frequent Subgragh Mining Algorithm PG-Miner Based On Claster Environment

Posted on:2011-05-29

Degree:Master

Type:Thesis

Country:China

Candidate:P F Chen

Full Text:PDF

GTID:2178360305465282

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The study of mining frequent patterns in large database has been focused all along in data mining fields. And frequent subgraph mining aims at resolving the structural pattern mining issues, such as chemistry, biology, WWW applications and other related problems. Because of the large scale and complex features of the graph abstracted from practical applications, the researches for high-performance frequent sub-graph mining algorithms have been a research issue. The main content of this thesis is proposed a parallel frequent itemsets mining algorithm:HPFP-Miner, then describes a parallel frequent subgraphs mining algorithm:PG-Miner. The main contents can be stated as follow.1. Parallel frequent itemsets mining algorithm:HPFP-Miner.In this section we presented the HPFP-Miner algorithm. In order to generate complete and non-redundant result sets, in this algorithm we proposed two parallel environment-based data structures HPFP-trees and HPFP-Forest which were used to compress and store the information of the frequent itemsets. In contrast to FP tree, the root of HPFP tree is not a "null" but an item to identify the specific tree, and HPFP tree's indexes are from children to parent. We linked all of the HPFP trees with a pointer list to constitute a HPFP-Forest. By doing so, the master node can dispatched HPFP tree to distributed processing nodes during execution phase, which make the algorithm more flexible and easier to control the load balance of all processing nodes.2. Parallel frequent subgraphs mining algorithm:PG-Miner.In this section, we proposed a new parallel frequent subgraphs mining algorithm PG-Miner. The algorithm dispatch the most complexity of sub-graph isomorphism to different processing nodes, which made the algorithm's execution time decrease considerably. The main strategy of the algorithm contains two steps. Firstly, we divided the mining process into two parts. One part is generate the frequent sub-tree sets by the master node, another part dispatches sub-tree sets to different processing nodes. Secondly, we processed the edge expansion and frequent sub-graph isomorphism which are the most complexity parts of frequent subgraphs mining in parallel. After algorithm analysis,we concluded that our algorithm's time complexity is o (2n* n2/k), where n is the number of frequent edges in graph sets, and k is the paralleled processing node number (total number of nodes is k+1). Finally, we validated the feasibility of the algorithm in different real and synthetic data sets.

Keywords/Search Tags:

data mining, association rules mining, frequent subgraphs mining, parallel association rules mining

PDF Full Text Request

Related items

1	Design Of Frequent Pattern Mining Algorithm LPS-Miner And Research On Parallel Formulations
2	Research On Link Prediction Based On Subgraphs Association Rules
3	Research On Mining Technology Of Association Rules And Meta-Rules
4	Researches And Applications Of The Association Rule Mining Algorithms Based On Spleen Feebleness Diagnosis
5	Research On The Basic Technology Of Association Rules
6	Research And Realization On Association Rules Mining Algorithms
7	Research Of Association Rules Mining Algorithm Based On Graph
8	A Study On Association Rules Mining Algorithm And Its Application On Web Mining
9	Research On Association Rules Algorithm In Data Mining
10	Research On Application Of Association Rules Mining Algorithm In Web Log Mining