Font Size: a A A

Research Of Fast Association Rule Mining Method Based On Equivalence Class Transformation

Posted on:2020-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:P B TianFull Text:PDF
GTID:2428330590474193Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Association rule mining can discover the relationship between transactions.It has the advantages of simple implementation and strong interpretability,and has been applied to many fields.However,as the scale of data expands,it is often impossible to mine out the rules timely.Therefore,many scholars are committed to improving the efficiency of mining association rules.The main research work of this paper is based on the Eclat algorithm.The Eclat algorithm uses the vertical database to perform frequent itemset mining,which making full use of the advantage of the vertical database,transforming the process of statistic the support of an itemset into compute the intersection size of two set s.The Eclat algorithm needs to compute the support of the itemset by frequently calculating the intersection size when executing.Therefore,when the efficiency of computing the set intersection is low,frequently computing the intersection will seriously affect the execution speed of the Eclat algorithm.In order to solve the problem of many invalid calculations caused by a large number of infrequent itemsets in the process of mining association rule,This paper proposes the Eclat_LSH and Eclat_LSHCF algorithms from the perspective of pruning strategy.And proposes the Sim-Eclat algorithm from the perspective of mining approximate result.The Eclat_LSH algorithm improve the efficiency of frequent itemset mining by reducing the elements that need to be compared:(1)By using the local sensitive hash technique,the Eclat_LSH algorithm transforming the process of computing the intersection of two large sets into the process of obtaining several small set intersections and then accumulating,which reduced the number that each element needs to be compared.(2)In the process of computing the support of the itemset,the Eclat_LSH algorithm makes full use of the concept of minimum support,and proposes the concept of the upper bound of the itemset support,it can be used to stop the process of computing the intersection of two sets in advance.The experimental results show that the above two methods can greatly improve the efficiency of frequent itemset mining.The Eclat_LSHCF algorithm improve the efficiency of frequent itemset mining by reduce the number of calculation intersections: it uses the data structure Cardinality Filter(CF)proposed by Takuma,CF can compute the upper bound of the intersection of two sets quickly.By combining CF with Eclat_LSH algorithm,we can compute the upper bound of the support of the candidate itemset through CF before computing the support of the itemset,and then use the upper bound to prune the infrequent itemset quickly.The experimental results show that this method can reduce many unnecessary calculations and improve the efficiency of the algorithm.The Sim-Eclat algorithm improves the efficiency of the algorithm from the perspective of mining approximate association rule.This algorithm uses the MinHash technique to estimate the support the itemset quickly.In addition,considering the case that the estimated support may has error,we proposed the concept of the easy-mixed boundary Boundary.For those itemset that the estimation of the support near the minimum support.The support is recalculated with the real set,which can improve the mining precision of the algorithm.This topic also proves that the mining error of the Sim-Eclat algorithm is controllable.The experimental results show that the Sim-Eclat algorithm alleviates the bottleneck of Eclat's inefficiency in calculating the support of the itemset,and greatly speed up the speed of the algorithm.
Keywords/Search Tags:association rule, frequent itemset, hash technique, pruning strategy
PDF Full Text Request
Related items