Font Size: a A A

Based Sampling Of Distributed Association Rule Mining Algorithm

Posted on:2007-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:M H LiFull Text:PDF
GTID:2208360185472067Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining is a nontrivial processing of finding patterns from very large database. These patterns are effective, novel, potential useful and apprehensible. The object of DM is to find significative patterns in large data collections, so it has extensive applied cost. In the study of DataMing, it has been a core issue how to find the associate rules that satisfy users' need efficiently.Choosing the random sample S from the database D, sampling is the procession of mining in the sample S instead of in the database D. Sampling method is suitable for the mining to large data base. In distributive case, when the data volume possessed by each site is relatively large, we may obtain sample set uses random sampling method at each site. It can improve the efficency of mining to replace the result of mining to the whole distributed database with that of these random sample.A dynamic itemset counting technique was proposed in which the database partitioned into blocks marked by start point. In this variation, new candidate itemsets can be added at any start point, unlike in Apriori, which determines new candidate itemsets only immediately prior to each complete database scan. The technique is dynamic in that it estimates the support of all of the itemsets that have been counted so far, adding new candidate itemsets if all of their subsets are estimated to be frequent. The resulting algorithm requires fewer database scans than Apriori, eases the load of I/O, and advances mining efficiency.The key factor, which influences the efficiency of distributed data mining, is the communication traffic among of each site in distributed database. Meta-learning method, which was presented firstly by Prodromidis etc. at 2000, builds the final global predictive model using the manner of ensemble learning. The advantages of Meta-learning method lie: at base learning phase, each site can select appropriate learning algorithm independently to build local base classifiers, meanwhile, there are not any expenses of communication and synchronization among the sites, so the...
Keywords/Search Tags:Distributed association rule mining, Sampling, Meta-learning, Similar degree, Concept lattice
PDF Full Text Request
Related items