Font Size: a A A

Research And Improvement For Apriori Algorithm Of Association Rule

Posted on:2013-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2248330377952366Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The ability of Producing and collecting data by using information technologyimproved greatly over these years. It causes the size of the data expands rapidly. So itis hoped that there are some new technologies and tools to help people analyze thevast amounts of data intelligently which costs large amounts of financial and time todiscover useful knowledge for decision-making purposes.Therefore,facing thechallenge of’people submerged by the data,but people still hunger to data’, datamining technology came into being and develops flourishingly.Data mining is one ofthe most active research fields especially in the fields of artificial intelligence anddatabase research.Data mining is a kind of process that reveals Potential usefulknowledge from massive data.The association rule mining is the main contents in thedata mining.And the discovery of the frequent itemsets is a key Problem of theassociation rule mining.In this article, the classic association rule mining algorithm-Apriori algorithm andthe basic theory of data mining and association rules were described in detail. Afterthe analysis of the classic Apriori algorithm, it revealed some limitations in it, such asthe redundancy rules, low efficiency and unable to mine quantitative rules and so on.Facing the bottleneck of this algorithm,two approaches to improve the algorithm wereproposed as follows:1. To overcome the bottleneck problem in the efficiency of the classic Apriorialgorithm,it puts forward an approach based on Bitset to improve Apriori algorithm:B_Apriori algorithm which combines with the Bitset characteristics of taking up lessmemory space and the faster logical operation.This algorithm constitutes transactionalBitset by scanning database one time.It determines the frequent itemsets using logical‘and’ operation of Bitset and bit-count operation.To improve the strategies ofconnecting and pruning,it uses logical ‘or’ operation of BitSet and counts the repeatednumber of operation result for generating candidate itemsets.It proves that the running time of the B_Apriori algorithm decreases sharply compared with Apriori algorithmby experiment.The algorithm avoids repeated database scanning and complicatedoperations of connecting and pruning,and furthermore,increases the efficiency ofApriori.2. In the classic Apriori algorithm,when a item need to be counted in the statisticalaffairs library,it will scan the library repeatedly.So it cost too much in this algorithm.To resolve this problem,a new Apriori the improved algorithm:Apriori_Matrixalgorithm which combines the concept of vector Matrix and inner product inmathematics was proposed.Apriori_Matrix algorithm improves the original algorithmfrom three aspects:reduce the production of itemsets data in frequent itemsetscandidate Ck,reduce the number of operations in the process of pruning,reducetransactions amount in database which need to be scanned In statistical support for thestage.Except this, the speed for the vector operation and bit operations is faster. Alsothe program will be easier to implement.It proves that the new algorithm greatlyreduces the cost and improves the time efficiency of the system by the experiment.
Keywords/Search Tags:data mining, association rule, apriori algorithm, bitset
PDF Full Text Request
Related items