Font Size: a A A

Research On Mining Algorithms For Association Rules

Posted on:2008-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:H T WuFull Text:PDF
GTID:2178360242998655Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data mining is to extract or mine knowledge from large amount of data. So it is also called Knowledge Discovery in Database. It is a new subject that involves a lot of subjects and develops with these subjects. With the progress of Information Technology,the application of data mining becomes more and more widely. Therefore, data mining is more and more important among the technologies of database.One of the more well-studied problems in data mining is the mining for association rules. And mining association rules can be decomposed into two sub problems. First, we need to identify all sets of items that are contained in a sufficient number of transactions above the minimum (support) requirement. These itemsets are referred to as large itemsets. Once all large itemsets are obtained, the desired association rules can be generated in a straightforward manner. At present, lots of algorithms for mining association rules have been brought forward. The most famous algorithms are Apriori and its transfiguration. These traditional algorithms often suffer from the bottleneck of itemsets generation and all items in a basket database are treated uniformly.Firstly, in this paper, a new algorithm called Apriori_BW is proposed to mine weighted binary association rules. In the algorithm, each item in the database is marked with a weight. In order to adopting the existing fast algorithms for finding binary association rules, such as Apriori, the weight of every item is between 0 and 1 . Consequently, a function is defined to product the total weight of the candidate itemsets. Items and itemsets are given weights to reflect their importance to the user in this algorithm. At the same time, the bottleneck of itemsets generation is avoided. Experiments for the algorithm are shown that with the same minimum support, number of candidate itemset generated by Apriori_BW is smaller than Apriori. And some itemsets,which didn't appear in the candidate set generated by Apriori, appeared in the candidate sets generated by Apriori_BW.The processing in the initial iterations in face dominates the total execution cost. A performance study was provided to compare various algorithms of generating large itemsets. It was shown that for these algorithmsthe candidate set generated during an early iteration is generally, in orders of magnitude,larger than the set of large itemsets it really contains.Therefore,the initial candidate set generation is the key issue to improve the performance of data mining. To address this issue, an effective hash-based algorithm is proposed to generate a smaller candidate set than previous method. The algorithm is called Apriori_BH. The generation of smaller candidate sets enables us to effective trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. Especially, both large 1-itemsets and 2-itemsets are discovered through scanning database once. The number of transactions to be scanned is reducing and the number of items in each transaction is trimming during large itemset discovery. Note that Apriori_BH is particularly powerful to determine large itemsets in early stages, thus improving the performance bottleneck. The size of Ck decreases significantly in later stages, thus rendering little justification its further filtering. This is the very reason that we use improved Apriori for later iterations. In the improved Apriori, the number of candidate k-itemsets is reducing during scanning the database. The experiments for Apriori_BH are shown that the number of candidate itemsets generated by Apriori_BH is smaller than those generated by Apriori.Both Apriori_BW and Apriori_BH algorithms are used to mining algorithms are used to mining association rules that are single layer and single-dimension. The algorithms are improved to be used to mining multi-layer and multi-dimensional association rules in the next research.
Keywords/Search Tags:data mining, association rule, apriori, weight, hash, itemset
PDF Full Text Request
Related items