Font Size: a A A

Design And Application Research Of Association Rule Mining Algorithm Based On Bitset Compression Technology

Posted on:2021-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:M N LiFull Text:PDF
GTID:2428330602497076Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present,the development of data technology is still a hot spot.The advancement of technology have accumulated data sets from all aspects.Data mining technology is an important means for us to obtain information from data.Facing the rapidly increasing data,how to improve the efficiency of data mining and apply it efficiently in different fields is still a hot issue that we are concerned about.Association rule mining algorithm can discover the potential connection between random data from the data set,which is one of the basic data analysis functions.As the classic algorithm for data mining,the Apriori algorithm needs to scan the database multiple times when mining frequent item set,and lacks a suitable pruning strategy.It will generate too many candidate sets,resulting in inefficient algorithm and large memory load problem.Based on this problem,many scholars have proposed different improved methods for the Apriori algorithm.Among them,the bitmap-based(Map-based Bit Set Association Rule,MBSA)algorithm mapped the data into the bitmap after scanning the database once.Meanwhile,the bitmap logic operation is used to realize the connection operation,which has high mining efficiency.After learning the algorithm based on bitmap,and using it to process actual data sets,we found that a great number of zero values after will be generated in the bitmap after converting some specific data into bitmaps,due to the characteristics of the data.That will takes up a lot of space in the bit operation storage.In the bitmap structure,the number 1 means that the item appears.In the bitmap mapping,the number 1 represents the occurrence of the item at this position,and the number 0 represents that the item does not appear.The zero value is only meaningful when performing the bit operation of with 1value in the connection step.The operation between a lager numbers of zero values is actually meaningless.The storage of a large number of zero values will oreduce memory utilization and operation efficiency.In order to solve this issue,a new improved association rules mining algorithm is proposed based on compressed bitmaps in this paper.This algorithm only considers the storage and operation of 1 values in the bitmap structure,1 values is called as the valid value.The algorithm uses array to store the position index of the valid values which achieve simple compression of the bitmap and reduce the memoey load.Based on the new storage method,this paper redesigned the connection algorithm,using the intersection operation of the array to realize the connection operation.At the same time,using a better intersection strategy to reduce the calculation time.The array obtained by the intersection operation is the storage array of the new candidate set,and the size of the new array is the support of the new candidate set.In the connection step of the traditional bitmap algorithm,frequent items is not effectively pruned.Multiple combinations between item sets will generate a large number of candidate sets,reducing the efficiency of the algorithm.Therefore,in this paper,when generating candidate sets,through an optimized pruning strategy,the useless candidate sets are eliminated to improve the operation efficiency.Therefore,an optimized pruning strategy is used in pruning step to eliminate useless candidate sets and improves computational efficiency.Although the efficiency of the intersection operation of the array is not as good as the bit operation,but the compression of the data can reduces the number of data and the amount of calculation.From the overall performance of the algorithm,the optimized algorithm improves the efficiency.
Keywords/Search Tags:Apriori Alogrithm, Association Rules, Proportion, Bitset
PDF Full Text Request
Related items