Font Size: a A A

Research And Application Of Association Rules Mining Based On Fp-growth Algorithm

Posted on:2007-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:X P LiuFull Text:PDF
GTID:2178360185965994Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Mining association rules from large datasets, which is one of the most important research fields in data mining, can reveal the interesting relationships between itemsets, therefore is widely applied to many fields such as marketing and sales, medicine, finance, biology, telecommunications, agriculture. Since 1993 R.Agrawal and R.Srikant firstly proposed the concept of association rules, a lot of algorithms have been developed for mining association rules.Fp-growth algorithm is one of the currently most popular algorithms for mining association rules without candidate generation. However, it has disadvantages such as lower space utilization rate and slower execution time when mining the large datasets. To overcome these drawbacks, based on the Fp-growth algorithm, this paper proposed two new algorithms for mining association rules from large datasets— New- Algorithm 1 and New- Algorithm 2.These two algorithms adopt different strategies to divide the large datasets into many subsets, and then, carry out constrained frequent itemsets mining for each subset. To divide the large datasets into subsets, the new algorithm 1 scans the large datasets for the same times as the total number of frequent 1-itemsets, and then, constructs a corresponding subset at each scan; the new algorithm 2 firstly divides the large datasets into datalists which contain the information of transactions in datasets, and then, divides the datalists into subsets in a way of deleting the first item in the first datalist and adding the remaining items into the other datalists, and then, repeating the same process for the second datalist and so on.Experiments have been conducted to compare the proposed algorithms with the Fp-growth algorithm. Experimental results show that the new algorithm1 and new algorithm2 have advantages such as lower memory usage, and therefore, are faster than the Fp-growth algorithm when the minimum support is low or the datasets is very large. Experimental results also show that the new algorithm 2 is faster than new algorithm 1 because of the lower execution time on creating subsets.In this paper, these two new algorithms are described firstly, and then an application is used to illustrate how to find association rules from the large datasets by these two new algorithms.
Keywords/Search Tags:constrained frequent itemsets mining, data mining, association rule, candidate generation, Fp-growth
PDF Full Text Request
Related items