Font Size: a A A

Research On The Parallel Mining Algorithms For Association Rules

Posted on:2003-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhaoFull Text:PDF
GTID:2168360092493402Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, large amounts of data have accumulated in all kinds of departments. It becomes more and more urgent to mine useful information and knowledge automatically to support the strategies. The technology of data mining emerges, as the times require. Association Rule Mining is an important branch of data mining, and becomes one of the widest applied data mining styles.However, the traditonal Association Rule Mining most adopted serial algorimms,such as level-wise algorithms, non-level-wise algorithms and algorithms that do not generate candidates. Apriori algorithm is more efficient one to deal with large itemsets in transaction database among these algorithms. But all the serial algorithms can not avoid scanning database repeatly, which reduces the efficiency of mining and has no ability to meet the need of large database. With the development of distributed database, some parallel formulations appear to prunning rules such as level-wise, DMA, and FMA, which improve the efficiency dramaticly. pSPADE, one of parallel algorithms for fast discovery of frequent sequences in large databases, decomposes the original search space into smaller suffix-based classes which can be solved independently on each processor. Then the pSPADE algorithm succeeds in maximizing data locality and minimizing synchronization.However, pSPADE, working on the assumption that each class and its intermediate idlists fit in main memory, which occupies considerable memory space. Once handling larger databases, it would run out of memory. In this paper, a memory extending scheme was proposed to solve this problem. It would releaselarge space of memery by writing some classes to disk respectively when there is no enough memery or when enough memory is obtained. If necessary, these classes would join the share queue. At the end of this paper, pros and cons,as well as applicability of this scheme, are provided.
Keywords/Search Tags:KDD, data mining, association rule, distributed database, parallel algorithm
PDF Full Text Request
Related items