Font Size: a A A

Mining Are Frequent Itemsets

Posted on:2008-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:C K WangFull Text:PDF
GTID:2208360215960842Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining refers to finding the hidden and previously unknown knowledge from very large data sets. Mining association rules is one of important data mining tasks, which has been widely studied by many researchers since it was proposed by R.Agrawal etc. in 1993. Mining association rules can be divided two steps: (1) mine all frequent itemsets, and (2) generate strong association rules from frequent itemsets, where the first step dominates the cost of association rules mining. Although the process of generating association rules is simple, it may cause some uninteresting rules to be generated.Noticing this problem, some researchers propose that many uninteresting association rules can be further filtered by using the lift measure to analyze the correlation between the antecedent and the consequent of association rules. However, this method also has two problems: neither can it reduce the time spent in mining frequent itemsets, nor can it ensure the items in the antecedent or in the consequent of a rule are positively correlated. An association rule may still be uninteresting if there are negatively correlated items in its antecedent or in its consequent.Based on mathematical expectation, this paper introduces the concept of positively correlated frequent itemsets, and proposes an algorithm to mine positively correlated frequent itemsets to solve above problems. Our algorithm can mine positively correlated frequent itemsets directly in FP-tree, push correlation analysis into the process of mining frequent itemsets. By this way, not only the number of the generated frequent itemsets can be reduced and the speed of mining frequent itemsets is improved, but also uninteresting association rules can be avoided to be generated when the frequent itemsets mined are used to generate association rules. Moreover, our algorithm further reduces the time of constructing conditional FP-trees recursively by extracting common items when frequent itemsets are mined. Our experiment study carried on benchmark data sets from the UCI Machine Learning Repository shows that our algorithm can greatly reduce the number of generated frequent itemsets, significantly improve the efficiency of mining frequent itemsets, and achieve excellent performance, especially when data sets are very large and/or very dense.
Keywords/Search Tags:association rules, frequent itemsets, FP-tree, positive correlation
PDF Full Text Request
Related items