Font Size: a A A

Research On All Frequent Itemsets Mining Algorithm And Its Application To The Classification Area

Posted on:2010-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178360302960354Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the current advances in social technology, the amount of information grows exponentially, how to extract the useful knowledge collection from such big amount interrelated information, has currently become one of the critical problems in the data mining field. The proposition of frequent itemsets is an effective solving method. Frequent itemsets are a collection of information which is extracted from large amounts of data that pass the support threshold. They contain a large number of potentially useful knowledge, and can effectively provide decision support for human. Currently, frequent itemsets mining algorithms based on Apriori priciple are effective on sparse data sets and short patterns, but not on dense data sets and long patterns, thus, their application is greatly limited.An improved Apriori algorithm is proposed to deal with the mining problem on the dense dataset and long pattern, which cannot be effectively handled by the current frequent itemsets mining algorithms. The new algorithm integrates the vertical data structure and intersecting method, and uses the index vector table to generate candidate 2-itemsets, besides, it also uses the non-frequent 2-itemsets to prune the candidate itemsets, and adopts the prefixArray to optimize the intersecting method. The experiment result shows that the improved Agorithm can effectively mine the frequent itemsets on the dense dataset and long patterns. In order to further enhance the efficiency of counting, the idea of diffsets is introduced to improve the Apriori algorithm, which changes the former tidsets cross-count into diffsets cross-count, thus promotes the Apriori algorithm.Finally, the achievement of frequent itemsets researching is applied to the classification field. The traditional classification algorithms have some shortcomings in classifying process which is black-box operation and classfied results which is hard to explain. Association rule classification based on frequent itemsets can sovle the above problem effectively. However, because of the lack of effective evaluation index of rules, the classification accuracy is generally not high. In order to deal with these problems, a new association rule classification algorithm was proposed. It introduces the criteria "interest" which can effectively remove the redundant rules, and uses "weight" to sort the rules, and the experiment shows that these techniques can improve the classification accuracy.
Keywords/Search Tags:data mining, association rules, frequent itemsets, associative classification rules, Apriori algorithm
PDF Full Text Request
Related items