Font Size: a A A

Multicast-based Distributed Association Rule Mining Algorithm

Posted on:2007-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhaoFull Text:PDF
GTID:2208360185471222Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of database technology and application of database manage system, each department has accumulate plenty of data by DBMS. Because of the hugeness and distribution of data, and the analysis methods on database system are few, people can't find the lurked association of the data and prognosticate the development of the future by those data. Data mining is the process which finds original and useful understanded patterns from huge data. Its purpose is to find all the useful patterns from plenty of data. So data mining is very important.This paper analyses the association theory and function detailedly, and goes deep into the research of distributed association rules mining. Based on the previous algorithms, we propose four effective algorithms after the research on association algorthms.The previous distributed algorithms communicate over loading and need much more database scaning. For solveing those problems, we propose four original assocaiton rules mining algorithms whichi are PDDM, GDS, DFP and MGMF algorithms. The PDDM algorithm improves the expansibility and communication of the previous algorithms effectively with less communication. The GDS and DFP algorithms reduce the scaning database I/O time relative to Apriori algorithm and reduce the communication relative to others distributed algorithm like FDM. The algorithm for mining global maximum frequent itemsets (MGMF) is different from other maximum frequent itemsets mining algorithms which can conveniently get all global maximum frequent itemsets using FP-tree structure by one time mining, and superset checking is very simple and speedy. The MGMF algorithm is more effective than previous maximum frequent itemsets mining algorithms, and can mine all maximum frequent itemsets throught only two times database scaning. The mostly contribution is:(1) Upswing the DDM algorithm, and propose a PDDM algorithm with preference power, reduce the distributed algorithm communication and improve the expansibility.
Keywords/Search Tags:distributed data mining, association rules, sampling, Grid, FP-tree, maximum frequent itemsets
PDF Full Text Request
Related items