Font Size: a A A

Researches On Algorithms For Mining Top-K Frequent Patterns

Posted on:2012-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhengFull Text:PDF
GTID:2178330335950575Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data Mining is the theory and method on researching how to mine knowledge from data in very large databases in nontrivial methods, which is one of the most cutting-edge researches in the database and decision-making area of the world. Association rule is an early, significant topic of data mining. In the process of mining association rules, frequent pattern mining is the most important part. How to mine the frequent patterns efficiently has been the focus of attention. However, in practice, because of the huge number of frequent patterns, the application of frequent patterns is impeded. Therefore, how to compress the frequent patterns becomes an important research direction.In this paper, we firstly introduced the concept and basic technology about association rules and frequent patterns, followed by a detailed desription of the technology on frequent pattern compression. Then we briefly analyzed several algorithms of effective frequent pattern compression. Finally, we proposed three algorithms for mining Top-K frequent patterns.(1) One of the algorithms is called ATFP, based on the Apriori algorithm. The algorithm follows the basic idea of Apriori, but during the mining process border support is used for pruning the candidate patterns instead of minimum support. However, due to the use of iteration idea, its mining efficiency is not so high compared with the other Top-K mining algorithms.(2) So in this paper, we then propose another algorithm based on the hybrid search strategy called MSTFP to improve the efficiency. It is a combination of ATFP and Top-K FP-growth. The breadth-first search strategy is used for mining the initial patterns. Then we use the depth-first strategy for the long patterns.(3) The final algorithm TFCP is based on the horizon fomat. This algorithm mainly uses the idea similar to the vertical mining method to mine frequent closed patterns. Using the restrictions in the TFP algorithm to optimize the result set at the same time.In this paper, we conduct the extensive performance test for the proposed algorithms, a series of experiments were conducted on the 19 data sets singled out from the UCI machine learning repository and 2 data sets from the IBM data generator. The experimental results have shown that, compared with the Top-K FP-growth. ATFP is slightly inferior. But the improved algorithm MSTFP performs extremely well between Top-K FP-growth and Ex-Miner. The algorithm TFCP we proposed in this paper is more efficient while mining the long patterns compared with TFP and CLOSET+. These results on frequent patterns could be efficiently applied to many practical problems.
Keywords/Search Tags:association rule, frequent pattern, pattern compression, Top-K, frequent closed pattern
PDF Full Text Request
Related items