Font Size: a A A

The Research And Application Of Association Rules In Data Mining

Posted on:2008-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:X M SiFull Text:PDF
GTID:2178360215974254Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Association mining is an important research branch of the data mining, it aims at discover implicit, interesting rules between attributes, named as Association Rules. It has simple form and can be understood easily, and also is the main measure of deriving knowledge from large databases. As a result, Research and application of the association rules mining have received great attention from researchers in related fields. Many achievements in this field have been obtained.The problem of discovering all of the association rules can be decomposed into two smaller problems: first, find all of the frequent itemsets; second, use the frequent itemsets to generate the strong association rules. Apriori is a classical algorithm of generating frequent itemsets. It acts as a milestone in the fields of data mining. The Apriori algorithm discovers frequent itemsets through an iteration method which called layer by layer search. In this algorithm, the k-itemset is used to generate the (k+l)-itemset, the frequent k-itemsets are extracted from the candidate k-itemsets. First, find the frequent 1-itemsets and record it as L1, L1 is intended to find the frequent 2-itemsets (L2), and L2 for L3. Search recurrently like this until the frequent k-itemsets cannot be found. A careful scan of the whole database is needed for every Lk. Once all the frequent itemsets were found, the strong association rules can be generated according the minimum confidence. But, the itemsets in the Ck which generate from the kth circulation are the candidate itemsets of the frequent itemsets, and the frequent itemsets must be a subset of the Ck. Every element in the Ck must be contrasted against the whole database to decide whether it can be added to Lk or not. This process is a bottleneck of the algorithm. The Apriori algorithm need repeatedly scan the transaction database which may be very large. While the database used to mining association rules is very large usually, as a result, the cost is very large. Because of the limited memory capacity, I/O load of the system will be very large, time consumed in scanning the database will be very long, and the performance will be dramatically affected. The Apriori algorithm involves three data objects including the set of the transaction (D), the candidate set(C), and the frequent set (L). The algorithm related to three operations: counting the support by scanning the set of the transaction, generating Ck by joining Lk-1 and Lk-1, pruning the Ck by scanning the Lk-1. Obviously, the improvement strategies for Apriori include reducing the scale of these three dataset and enhancing the efficiency of the three operations.We will find when analyze the Apriori algorithm: with the increase of k, not only the number of the k-itemsets is becoming small, but also the number of transaction which including the k-itemsets is becoming small, too. So, when we count the support of every candidate itemset, with the increase of k, we gradually reduce the scale of the transaction be scanned. Consequently, the cost of I/O is reduced dramatically. Furthermore, the Apriori algorithm used the Apriori property to reduce the search space, and the efficiency of generating the frequent itemset is improved. Can we adopt a method to gradually reduce the scale of the Ck in the process of the scanning so as to improve the efficiency of the algorithm? According to the above analysis, in this paper, an algorithm of mining association rules innovated from Apriori algorithm was presented. This algorithm is able to gradually reduce the scale of the transaction used to scanning and gradually reduce the scale of the Ck at the same time. As a result, the performance of the algorithm is significantly improved. At the end of the paper, the author summarized the application fields of the association rules. Further research orientation is also proposed.
Keywords/Search Tags:data mining, association rule, Apriori, frequent itemset, GBARM
PDF Full Text Request
Related items