Mining Are Frequent Itemsets

Posted on:2008-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:C K Wang

Full Text:PDF

GTID:2208360215960842

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data mining refers to finding the hidden and previously unknown knowledge from very large data sets. Mining association rules is one of important data mining tasks, which has been widely studied by many researchers since it was proposed by R.Agrawal etc. in 1993. Mining association rules can be divided two steps: (1) mine all frequent itemsets, and (2) generate strong association rules from frequent itemsets, where the first step dominates the cost of association rules mining. Although the process of generating association rules is simple, it may cause some uninteresting rules to be generated.Noticing this problem, some researchers propose that many uninteresting association rules can be further filtered by using the lift measure to analyze the correlation between the antecedent and the consequent of association rules. However, this method also has two problems: neither can it reduce the time spent in mining frequent itemsets, nor can it ensure the items in the antecedent or in the consequent of a rule are positively correlated. An association rule may still be uninteresting if there are negatively correlated items in its antecedent or in its consequent.Based on mathematical expectation, this paper introduces the concept of positively correlated frequent itemsets, and proposes an algorithm to mine positively correlated frequent itemsets to solve above problems. Our algorithm can mine positively correlated frequent itemsets directly in FP-tree, push correlation analysis into the process of mining frequent itemsets. By this way, not only the number of the generated frequent itemsets can be reduced and the speed of mining frequent itemsets is improved, but also uninteresting association rules can be avoided to be generated when the frequent itemsets mined are used to generate association rules. Moreover, our algorithm further reduces the time of constructing conditional FP-trees recursively by extracting common items when frequent itemsets are mined. Our experiment study carried on benchmark data sets from the UCI Machine Learning Repository shows that our algorithm can greatly reduce the number of generated frequent itemsets, significantly improve the efficiency of mining frequent itemsets, and achieve excellent performance, especially when data sets are very large and/or very dense.

Keywords/Search Tags:

association rules, frequent itemsets, FP-tree, positive correlation

PDF Full Text Request

Related items

1	Mining Negative Association Rules Study Based On Negative Frequent Itemsets
2	Research And Application Of Frequent Itemsets Mining Algorithm
3	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
4	Positive And Negative Association Rules Mining Algorithm In The Relational Data Mining
5	Research On The Method Of Condensing Association Rules
6	Association Rules And Incremental Updating Of Association Rules
7	Research On Algorithms For Mining Maximal Frequent Itemsets
8	Research On Top-K Frequent Itemsets Datamining Algorithm
9	Research On Fast Algorithms For Frequent Itemsets Mining Based On Compressed FP-tree
10	Research On Mining Algorithms Of Maximal Frequent Itemsets And Opened Frequent Itemsets