The Research And Application Of Association Rules In Data Mining

Posted on:2008-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:X M Si

Full Text:PDF

GTID:2178360215974254

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Association mining is an important research branch of the data mining, it aims at discover implicit, interesting rules between attributes, named as Association Rules. It has simple form and can be understood easily, and also is the main measure of deriving knowledge from large databases. As a result, Research and application of the association rules mining have received great attention from researchers in related fields. Many achievements in this field have been obtained.The problem of discovering all of the association rules can be decomposed into two smaller problems: first, find all of the frequent itemsets; second, use the frequent itemsets to generate the strong association rules. Apriori is a classical algorithm of generating frequent itemsets. It acts as a milestone in the fields of data mining. The Apriori algorithm discovers frequent itemsets through an iteration method which called layer by layer search. In this algorithm, the k-itemset is used to generate the (k+l)-itemset, the frequent k-itemsets are extracted from the candidate k-itemsets. First, find the frequent 1-itemsets and record it as L₁, L₁ is intended to find the frequent 2-itemsets (L2), and L₂ for L₃. Search recurrently like this until the frequent k-itemsets cannot be found. A careful scan of the whole database is needed for every L_k. Once all the frequent itemsets were found, the strong association rules can be generated according the minimum confidence. But, the itemsets in the C_k which generate from the kth circulation are the candidate itemsets of the frequent itemsets, and the frequent itemsets must be a subset of the C_k. Every element in the C_k must be contrasted against the whole database to decide whether it can be added to L_k or not. This process is a bottleneck of the algorithm. The Apriori algorithm need repeatedly scan the transaction database which may be very large. While the database used to mining association rules is very large usually, as a result, the cost is very large. Because of the limited memory capacity, I/O load of the system will be very large, time consumed in scanning the database will be very long, and the performance will be dramatically affected. The Apriori algorithm involves three data objects including the set of the transaction (D), the candidate set(C), and the frequent set (L). The algorithm related to three operations: counting the support by scanning the set of the transaction, generating C_k by joining L_k-1 and L_k-1, pruning the C_k by scanning the L_k-1. Obviously, the improvement strategies for Apriori include reducing the scale of these three dataset and enhancing the efficiency of the three operations.We will find when analyze the Apriori algorithm: with the increase of k, not only the number of the k-itemsets is becoming small, but also the number of transaction which including the k-itemsets is becoming small, too. So, when we count the support of every candidate itemset, with the increase of k, we gradually reduce the scale of the transaction be scanned. Consequently, the cost of I/O is reduced dramatically. Furthermore, the Apriori algorithm used the Apriori property to reduce the search space, and the efficiency of generating the frequent itemset is improved. Can we adopt a method to gradually reduce the scale of the C_k in the process of the scanning so as to improve the efficiency of the algorithm? According to the above analysis, in this paper, an algorithm of mining association rules innovated from Apriori algorithm was presented. This algorithm is able to gradually reduce the scale of the transaction used to scanning and gradually reduce the scale of the C_k at the same time. As a result, the performance of the algorithm is significantly improved. At the end of the paper, the author summarized the application fields of the association rules. Further research orientation is also proposed.

Keywords/Search Tags:

data mining, association rule, Apriori, frequent itemset, GBARM

PDF Full Text Request

Related items

1	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
2	Improvement For Apriori Algorithm Of Association Rule Mining
3	The Research On The Algorithms Of Mining Association Rules
4	The Improvement And Research For Array-based Association Rule Mining Algorithm
5	Research And Application Of Association Rule Mining Algorithm
6	Research On Frequent Itemset Mining Algorithm And Its Parallelization Based On Spark
7	Research And Application On Association Rule Mining
8	Data Mining Technique Application Study On Logistics System
9	The Research And Implementation Of Association Rule Data Mining Algorithm
10	Research On Algorithm Of Mining Association Rules Based On Matrix