Font Size: a A A

Large-scale Databases Association Rule Mining Algorithm

Posted on:2008-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y H DingFull Text:PDF
GTID:2208360215472139Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is a technique that aims to analyze and understand large source data and reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Like the other new techniques, however, data mining must develop gradually from concept creation, accepted importance, wide discussion, few usage attempts to a large applications. Most experts consider it as the phase of wide discussion today. It still needs theoretic studies and algorithm exploring.Association rule mining is an important branch of data mining that it has obtained many valuable results but there still are a deal of more challenging problems to discuss. For large databases, the research on improving the mining performance and precision is necessary. Now, so many focuses on association rule mining are about new mining theories, algorithms and improving to old methods.In view of the current situation and the trend of data mining and association rules mining, we conduct a research on them. Based on analyzing and categorizing the current association rules algorithms, we focus on the research of association rules algorithm for large database. The algorithms we design here have a good performance on availability and efficiency. The main work is showed as follows:1,In this paper, the bottleneck of Apriori algorithm on discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BOM (Base On Matrix), is proposed. The proposed algorithm is fundamentally different from the known algorithm Apriori. It adopts matrix to save the information, and gets the frequent k-itemset directly by matrix mathematical operation. Empirical evaluation shows that the algorithm outperforms the known one for large databases.2,Mining association rules from large database is very costly. Most of the proposed parallel algorithms for association rules mining have to scan the database at least twice, which influences the efficiency of the algorithms. In this article, a parallel algorithm SO (Scan Once) has been proposed for shared-memory multiprocessor (SMP), which only scans the database once. And this algorithm is fundamentally different from the known parallel algorithm CD (Count Distribution) algorithm. Empirical evaluation shows that the algorithm outperforms the known one.3,Most of the current algorithms adopt single support to fred association rules in the database, which results in finding significant rare data less efficiently. In this article, associating the Undirected Itemsets Graph and RSAA algorithm, we propose a new algorithm for discovering the association rules for significant rare data in the database. The algorithm adopts multi-support to find the significant rare data. By compared with RSAA algorithm, this algorithm is proved to have better performance on the efficiency of finding significant rare data and the utility of association rules.
Keywords/Search Tags:data mining, association rules, large database, significant rare data, sequential algorithm, parallel algorithm
PDF Full Text Request
Related items