Font Size: a A A

Based On Incremental Mining Algorithm FP-growth-BIT Algorithm

Posted on:2015-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:G N CuiFull Text:PDF
GTID:2268330428462764Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the advancement of technology, means of getting data isgrowing, the same with the amount of data, while when faced with themassive data, the tools of analyzing and processing are not enough. Tomeet the needs data mining emerged, whose technology association ruleis the most important method, while apriori algorithm is the most classicassociation rule algorithm. But Apriori algorithm requires multiplescanning the database to generate a large number of candidate sets.Through in-depth analysis of association rules algorithm studies, thisarticle analyzes and then summarizes some of the association rules toimprove the efficiency of the algorithm improvements. First, comparedwith the Apriori algorithm, FP-Growth algorithm utilizes FP-Tree storingcompressed raw transaction data, translating the problem of miningfrequent item sets into the problem of mining FP-Tree. This methodreduces the number of scanning the database and then is wildly usedwithin all the association rules. However, FP-Growth algorithm isover-reliance on the fixed minimum support and fixed database, so whenthere is a little change in the minimum support change or transactiondatabase, the previously mined frequent item sets are not availableanymore. Re-scanned database is needed to get tap new project sets.Finally, to address this problem, this article proposes a BIT (BatchIncremental Tree) algorithm. This new method can use the results of previous incremental mining excavation methods. Experiments show that,BIT algorithm compared with Apriori algorithm and FP-Growthalgorithm, greatly improves the efficiency of the algorithm.In the end of this article, it deeply discusses the association rules inthe Incremental updating by a case study. With the help of BIT algorithms,when the minimum support increased association rules or when thingsdatabase update, the problems can be solved easier. At this time, it is noneed to re-scanning the original database, and if effectively using theoriginal set effectively, it can be changed to frequent1-item setsgenerated under the new support. The new frequent item sets deleteunnecessary item transaction, in another words, it is that the number ofentry support is less than that of the new support count, which reduces thetime of computing unnecessary items and searching the shared prefix, andreduces the amount of computation. When the results in the miningresults are consistent with that of FP-Growth algorithm mining case, therunning time is greatly reduced. Thus, the case analysis not only provesthat the efficiency of the algorithm BIT is higher than FP-Growthalgorithm, also proves the feasibility and practical application versatilityof the new method-algorithm BIT.
Keywords/Search Tags:Data mining, Association rules, Apriori algorithm, FP-Growth algorithm, BIT algorithm
PDF Full Text Request
Related items