Font Size: a A A

Association Rule Mining Algorithms Based On Boundary Idea

Posted on:2009-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2178360242489888Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Mining Association Rules is an important issue in data mining. Its success in the field of commercial applications makes it become one of the most mature and important research in data mining. Mining Association Rules is divided into two steps: Mining the frequent itemsets and Mining strong Association Rules using them. As the first step determines the overall performance of mining, the study of the frequent itemsets mining is of great significance. Data in the transaction database exists in a certain environment, such as time, place and customers, but they have been ignored in the traditional association rules mining. Multi-dimensional Association Rules can provide more useful information on the real world, so its study will be of very practical significance and has a broad prospect.First, this paper introduces the breadth-first search, the depth-first search and the breadth and depth of integration search algorithms for mining frequent itemsets, specifically including: Apriori, FP-growth, Eclat, Top and Bottom boundary, Diffset, RCFP and LR, and so on. The specific transaction database as an example, this paper introduces the storage structure for frequent itemsets of the last five algorithms and the comparison analysis of the structures and the depth in the process of building trees.Secondly, the paper presents an algorithm (Left and Right boundary Re-condensed Frequent Pattern, LR-RCFP) which is based on boundary idea and the RCFP-tree for frequent itemsets mining. The LR-RCFP algorithm uses the compressed storage structure for frequent itemsets in RCFP algorithm and the idea of left and right boundary in LR algorithm. The experiments are carried on 6 benchmark datasets from the UCI Machine Learning Repository, and the experimental results show that the LR-RCFP algorithm is more efficient and stable than Eclat, Diffset, Top and Bottom boundary, RCFP and LR algorithms.Finally, the paper proposes an algorithm (Multi-dimensional Left and Right boundary Re-condensed Frequent Pattern, MLR-RCFP) which is on the basis of the LR-RCFP algorithm for the multi-dimensional association rule mining. The algorithm uses LR-RCFP algorithm for frequent itemsets mining, uses the RCFP-tree structure to mine frequent predicate sets. The paper gives the "Co-constrain" idea to mine multi-frequent itemsets and multi-dimensional association rules during the frequent predicate sets mining.The experiments are carried on 6 benchmark datasets from the UCI Machine Learning Repository, and the experimental results and analysis show that the MLR-RCFP algorithm is more efficient than MFP and MPIT algorithms; as the dimensions rise, the running time of the algorithm shows more slowly rising trend.
Keywords/Search Tags:Association Rule, Multi-dimensional Association Rule, Frequent Itemsets, LR, Left and Right boundary, RCFP-tree
PDF Full Text Request
Related items