Font Size: a A A

Study On Multi-Level Association Rules Mining Algorithm Based On FP-Tree

Posted on:2006-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:X Q HuFull Text:PDF
GTID:2168360155472726Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Data mining is a process to reaveal latent and interesting knowledge from massive data, and an effective approach to solve the problem of "rich data and poor knowledge". Association rules mining can reveal interesting correlations between itemsets from massive data. It is an important subject of data mining and is widely used in real life. According to the number of hierarchies involved in rules, association rules can be classified into single-level association rules and multi-level association rules. Compared with single-level association rules, multi-level association rules can provide richer and more generalized knowledge, and meet with more users'requirements, so it is of great practical value to study Multi-level association rules mining. The proposed algorithms for mining multi-level association rules such as Cumulate algorithm, ML_T2L1 algorithm are based on Apriori algorithm. These algorithms still adopt "candidate generate and test"method to get frequent patterns which cause large cost in two aspects: (1) I/O cost. In these algorithms, the database is needed to scan k times (k is the largest length of the candidates), which results in large I/O cost; (2) Computing cost. These algorithms may generate a large number of candidates whose frequency is checked by pattern matching, which leads large computing cost. So these algorithms are inefficient. FP_Growth algorithm is an efficient algorithm for mining single-level association rules. It does not generate candidates and only scans database two times. FP_Growth algorithm overcomes the shortcomings of Apriori algorithm effectively, thus the efficiency of FP_Growth algorithm improves a lot compared with Apriori algorithm. By expanding FP_Growth algorithm, the author proposes a highly efficient algorithm MLAR-FP for mining multi-level association rules. The following measures are applied to expand FP_Growth algorithm: (1) Expand every transaction by adding all ancestors of each item during the process of scanning the database. This measure ensures that we can get multi-level association rules; (2) Delete the ancestors that are not frequent items in time to compress search space and enhance the efficiency of mining; (3) Avoid generating redundant frequent patterns. To verify the correctness and efficiency of MLAR-FP algorithm, the author conducts an experiment on a sales database of a medicine company, and compares it with Cumulate algorithm. The results of the experiment show that MLAR-FP algorithm gets the same result as Cumulate algorithm does and inherits the strongpoint of high efficiency of FP_Growth algorithm. Because MLAR-FP algorithm uses divided conquering approach to get frequent patterns, it has the potential of parallel computing. According to this feature, the author develops parallel MLAR-FP algorithm based on message passing interface under the environment of cluster of workstation. The parallel policy of this algorithm is data parallel and parallel mode is master/slave mode. Considering the difference of each node's capability, this algorithm balances the load of all computing nodes by distributing data to each node dynamically.
Keywords/Search Tags:Multi-level association rules, FP_Growth algorithm, MLAR-FP algorithm, Parallel computing
PDF Full Text Request
Related items