Font Size: a A A

The Binding Association Rules In The Distributed Environment Mining Algorithm And Implementation

Posted on:2006-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:J F DuFull Text:PDF
GTID:2208360182968881Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The rapid development of distributed computing environments makes a great process in distributed data mining. Utilize constraints can increase mining efficiency in the process of actual data mining. In this paper, we discuss the problem of distributed mining association rules with constraints.Firstly, according to the characteristics of distributed databases and item constraints, two algorithms for distributed mining association rules with item constraints called DMAIC and DAMICFP are developed. The DMAIC algorithm is based on Apriori algorithm and DAMICFP is based on FP-growth algorithm. DMAIC is an algorithm with high reliability and simple communication protocol, and it suits the system of low communication requirement. DAMICFP is an algorithm with high efficiency and excellent communication quality, and it suits the system of high communication requirement. Secondly, multi-rule constraints (anti-monotone constraint, monotone constraint, succinct constraint, and convertible constraint) have been integrated into the algorithm for association rules mining based on vertical data layout by utilizing the lattices theory and the decomposing method of equivalence classes sufficiently. Their respective algorithms based on the Eclat algorithm are also presented. A bottom-up search method is put forward and the constraints are checked at the process of calculating frequent itemsets. The new algorithms scan the database few times and have no need of pruning candidate itemsets. They also can be solved in memory. Thirdly, an algorithm for distributed mining association rules with multi-rule constraints called DMCASE is presented using sampling and constrained Eclat algorithm. At each database sites, sampling algorithm and constrained Eclat algorithm are implemented. And the local frequent itemsets satisfying constraints are developed. They then are combined to global frequent itemsets that satisfying constraints based on learning frominduction. The data structures of prefix tree and bit matrix ensure that DMCASE algorithm scans the whole database only once. Completeness analysis and experiments prove that DMCASE can ensure simultaneous mining efficiency and precision.At the same time, the application of association rule mining with item constraints to bioinformatics are researched. An algorithm based on FP-growth for mining association rules with item constraints from gene expression data called ICFP is developed. Results from our experiments from yeast gene expression profiles show that ICFP applies to mining association rules of gene expression profiles. And it can increase mining performance to a great extent.
Keywords/Search Tags:data mining, distributed data mining(DDM), association rules with constraints
PDF Full Text Request
Related items