Font Size: a A A

The Research Of Association Rule Sampling Algorithm Based On Data Warehouse

Posted on:2007-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:H DingFull Text:PDF
GTID:2178360185466859Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is the process of discovering interesting knowledge from large volumes of data which are stored either in databases, data warehouses, or other information repositories. It includes lots of technical measures such as association rule mining,prediction, classification, clustering and evolutionary analysis. Of these techniques, the association rule mining technique is the most important and also the most widely-used method.The concept of association rule was first proposed in 1993 by Dr. Rakesh Agrawal who was working at IBM, to describe the relationship between transactional items in transaction databases, i.e. the frequent relationship. The paper first studies some typical association rule mining algorithms such as Apriori,AprioriTid, AprioriHybrid, and FUP2 Algorithms.Sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining iscovered association rules. Some studies have been done on the problem of maintaining the discovered association rules when updates are made to the database. All proposed methods must examine not only the changed part but also the unchanged part in the original database, which is very large,and hence take much time. Worse yet, if the updates on the rules are performed frequently on the database but the underlying rule set has not changed much, then the effort could be mostly wasted. In this paper, we devise an algorithm which employs sampling techniques to estimate the difference between the association rules in a database before and after the database is updated. The estimated difference can be used to determine whether we should update the mined association rules or not. If the estimated difference is small, then the rules in the original database is still a good approximation to those in the updated database. Hence, we do not have to...
Keywords/Search Tags:data mining, association rules, sampling, update, confidence interval
PDF Full Text Request
Related items