Font Size: a A A

Sampling: An efficient solution for data mining of association rules

Posted on:2004-02-27Degree:M.C.ScType:Thesis
University:Dalhousie University (Canada)Candidate:Zhu, JingboFull Text:PDF
GTID:2468390011975970Subject:Computer Science
Abstract/Summary:
A classic problem in data mining is to find association rules between items in a large dataset of transactions, where a transaction is a subset of related items. For example, the items of a transaction might have been purchased at the same time. An association rule predicts the likelihood of an item appearing in a transaction at the same time with other items. The first step in finding association rules is discovering frequent itemsets.; In this thesis, we explore sampling techniques that can be used to find frequent itemsets. In particular, we compare the following three sampling algorithms: Simple Random Sampling with Replacement (SRSWR), Finding Associations from Sampled Transactions (FAST), and Finding Associations from Sampled Transaction Randomly (FASTRan). The first two algorithms, SRSWR and FAST, are previously known, whereas FASTRan is a new algorithm that we obtained by modifying FAST. Our experiments show that FAST and FASTRan produce significantly more accurate results, and moreover, our modified algorithm FASTRan has a slightly better performance than FAST.
Keywords/Search Tags:Association, FAST, Sampling, Items, Transaction, Fastran
Related items