Font Size: a A A

Improved Algorithm For Parallel Association Rules Mining

Posted on:2008-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:J S SunFull Text:PDF
GTID:2178360215958223Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Association rules are the important technique in data mining, because the successful applications such as business made it become the most mature, important and active research.The core of mining association rules is finding frequent itemsets. Nowadays there are many algorithms in finding frequent itemsets such as the algorithm Apriori, Partition. Parallelization technique is introduced to improve the efficiency of mining frequent itemsets. Algorithm CD is a simple parallelization of algorithm Apriori and aims at decreasing the traffic and gaining preferable distribution of the task.An improved parallel algorithm of mining association rules is presented in this paper against the problems existing in algorithm CD, such as the weighty operations of I/O, the repetition of data structure, the useless using of memory. It partitions the database by using the technology of dynamic datasets partition based on algorithm CD, then distributes the data to each processor by the control processor to reduce the weighty operations of I/O. It uses a control processor to manage the other processors in order to achieve the parallelization in mining. Then the data are stored on each processor by using P-tree structure in order to optimize data structure and use memory potently, so it can find out frequent itemsets quickly and implement potent mining in database. At last, the two algorithms are validated by experiment and the results show that the improved algorithm can improve the efficiency more potent and achieves the primary purpose of parallelization.
Keywords/Search Tags:parallel association rules, frequent itemsets, P-tree, dynamic datasets partition
PDF Full Text Request
Related items