Supporting efficient and scalable frequent pattern mining

Posted on:2006-08-24

Degree:Ph.D

Type:Dissertation

University:Hong Kong University of Science and Technology (People's Republic of China)

Candidate:Liu, Guimei

Full Text:PDF

GTID:1458390008953440

Subject:Computer Science

Abstract/Summary:

With the large amount of data collected in various applications, data mining has become an essential tool for data analysis and decision support. Frequent itemset mining is an important problem in the data mining area with a wide range of applications. In this dissertation, we investigate several techniques to support efficient and scalable frequent itemset mining.; We first identify the key factors of a frequent itemset mining algorithm, and propose an algorithm AFOPT for efficient mining of frequent itemsets. The AFOPT algorithm adopts the pattern growth approach. It uses a new structure, called Ascending Frequency Ordered Prefix-Trie (or AFOPT for short), and two other structures to store conditional databases. An adaptive strategy is employed to choose proper structures according to the density of the conditional databases, which makes the AFOPT algorithm perform well on both sparse and dense databases.; To deal with very large databases with millions of transactions, we propose a Search Space based Partitioning approach SSP for scalable out-of-core mining. The novel idea of SSP is to partition the database based on the search space. Different partitions share data but not frequent itemsets. As such, frequent itemsets can be mined from each partition without the need to scan the whole database to verify their support. Furthermore, techniques are developed so that the available memory can be utilized during the mining process to minimize disk I/O. Our experiment results show that the proposed algorithms achieve significant performance improvement over existing out-of-core mining algorithms.; Many decision support systems need to support online interactive frequent itemset mining, which is very challenging because frequent itemset mining is a computation intensive repetitive process. We propose a compact disk-based data structure---CFP-tree (Condensed Frequent Pattern tree) for storing precomputed frequent itemsets to support online mining requests. Efficient algorithms for retrieving frequent itemsets and association rules from a CFP-tree, as well as efficient algorithms to construct and update a CFP-tree, are developed. One obstacle to online association rule mining is the undesirably large result size. We introduce the concept of frequent itemset class and the class association rule to reduce the result size without information loss. The benefits of organizing itemsets into classes are two-fold. On the one hand, the output size is reduced and it is easier for users to browse the results. On the other hand, the computation cost is saved because the rule generation cost of the itemsets in the same class is shared. Our performance study demonstrates that with CFP-tree and the concept of frequent itemset class, frequent itemset and association rule mining requests can be responded to promptly.

Keywords/Search Tags:

Mining, Frequent, Support, Efficient, Association rule, Data, Pattern, Scalable

Related items

1	Research And Application On Association Rule Mining
2	The Research On The Related Problems Of Association Rule Mining
3	Research And Implementation Of Mining Association Rules In Retailing
4	Research And Application Of Association Rule Mining Algorithm
5	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
6	Research On The Mining Algorithms Of Association Rule
7	The Research And Implementation Of Association Rule Data Mining Algorithm
8	The Research Of Association Rules Algorithm Based On Frequent Pattern Tree
9	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
10	Study And Design On The Algorithms Of Mining Association Rules