Large-scale Databases Association Rule Mining Algorithm

Posted on:2008-09-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Ding

Full Text:PDF

GTID:2208360215472139

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data mining is a technique that aims to analyze and understand large source data and reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Like the other new techniques, however, data mining must develop gradually from concept creation, accepted importance, wide discussion, few usage attempts to a large applications. Most experts consider it as the phase of wide discussion today. It still needs theoretic studies and algorithm exploring.Association rule mining is an important branch of data mining that it has obtained many valuable results but there still are a deal of more challenging problems to discuss. For large databases, the research on improving the mining performance and precision is necessary. Now, so many focuses on association rule mining are about new mining theories, algorithms and improving to old methods.In view of the current situation and the trend of data mining and association rules mining, we conduct a research on them. Based on analyzing and categorizing the current association rules algorithms, we focus on the research of association rules algorithm for large database. The algorithms we design here have a good performance on availability and efficiency. The main work is showed as follows:1,In this paper, the bottleneck of Apriori algorithm on discovering association rules between items in a large database of sales transactions is discussed, and a novel algorithm, BOM (Base On Matrix), is proposed. The proposed algorithm is fundamentally different from the known algorithm Apriori. It adopts matrix to save the information, and gets the frequent k-itemset directly by matrix mathematical operation. Empirical evaluation shows that the algorithm outperforms the known one for large databases.2,Mining association rules from large database is very costly. Most of the proposed parallel algorithms for association rules mining have to scan the database at least twice, which influences the efficiency of the algorithms. In this article, a parallel algorithm SO (Scan Once) has been proposed for shared-memory multiprocessor (SMP), which only scans the database once. And this algorithm is fundamentally different from the known parallel algorithm CD (Count Distribution) algorithm. Empirical evaluation shows that the algorithm outperforms the known one.3,Most of the current algorithms adopt single support to fred association rules in the database, which results in finding significant rare data less efficiently. In this article, associating the Undirected Itemsets Graph and RSAA algorithm, we propose a new algorithm for discovering the association rules for significant rare data in the database. The algorithm adopts multi-support to find the significant rare data. By compared with RSAA algorithm, this algorithm is proved to have better performance on the efficiency of finding significant rare data and the utility of association rules.

Keywords/Search Tags:

data mining, association rules, large database, significant rare data, sequential algorithm, parallel algorithm

PDF Full Text Request

Related items

1	Application And Research On Association Rule Mining Algorithm In Large Data Sets
2	Study On Parallel For Association Rules Mining
3	Research On Updated Algorithm Of Parallel Association Rules
4	Research On The Parallel Mining Algorithms For Association Rules
5	Improvement And Parallel Processing Of Association Rules Algorithm On Data Mining
6	The Study Of Parallel Algorithm For Mining Association Rules And Application In Medicine Selling System
7	Research On Association Rules Mining In Data Streams And Its Application
8	Association Rules In Data Mining Research And Of Teaching Quality Assessment
9	The Research And Implementation Of Parallel Association Rules Algorithm Based On Cloud Environment Data Mining
10	Classification Association Rule Induction Algorithm And Applied Research