Research On All Frequent Itemsets Mining Algorithm And Its Application To The Classification Area

Posted on:2010-12-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2178360302960354

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

With the current advances in social technology, the amount of information grows exponentially, how to extract the useful knowledge collection from such big amount interrelated information, has currently become one of the critical problems in the data mining field. The proposition of frequent itemsets is an effective solving method. Frequent itemsets are a collection of information which is extracted from large amounts of data that pass the support threshold. They contain a large number of potentially useful knowledge, and can effectively provide decision support for human. Currently, frequent itemsets mining algorithms based on Apriori priciple are effective on sparse data sets and short patterns, but not on dense data sets and long patterns, thus, their application is greatly limited.An improved Apriori algorithm is proposed to deal with the mining problem on the dense dataset and long pattern, which cannot be effectively handled by the current frequent itemsets mining algorithms. The new algorithm integrates the vertical data structure and intersecting method, and uses the index vector table to generate candidate 2-itemsets, besides, it also uses the non-frequent 2-itemsets to prune the candidate itemsets, and adopts the prefixArray to optimize the intersecting method. The experiment result shows that the improved Agorithm can effectively mine the frequent itemsets on the dense dataset and long patterns. In order to further enhance the efficiency of counting, the idea of diffsets is introduced to improve the Apriori algorithm, which changes the former tidsets cross-count into diffsets cross-count, thus promotes the Apriori algorithm.Finally, the achievement of frequent itemsets researching is applied to the classification field. The traditional classification algorithms have some shortcomings in classifying process which is black-box operation and classfied results which is hard to explain. Association rule classification based on frequent itemsets can sovle the above problem effectively. However, because of the lack of effective evaluation index of rules, the classification accuracy is generally not high. In order to deal with these problems, a new association rule classification algorithm was proposed. It introduces the criteria "interest" which can effectively remove the redundant rules, and uses "weight" to sort the rules, and the experiment shows that these techniques can improve the classification accuracy.

Keywords/Search Tags:

data mining, association rules, frequent itemsets, associative classification rules, Apriori algorithm

PDF Full Text Request

Related items

1	Research On Association Rules Mining And Associative Classification Based On Bit Table
2	Research On Algorithms For The Association Rules In Data Mining
3	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
4	Association Rules Algorithm And Its Applications In Medical Data Mining
5	Research On The Method Of Condensing Association Rules
6	Association Rule Mining Technology Improvements In Computer Forensics
7	Research Of Association Rules Mining Algorithm Based On Graph
8	Research And Application Of Frequent Itemsets Mining Algorithm
9	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
10	Research On Mining Technology Of Association Rules And Meta-Rules