Font Size: a A A

Research On Apriori Algorithm In Association Rlues Mining

Posted on:2017-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:L ZengFull Text:PDF
GTID:2348330485981662Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today,with the rapid growth of data,how to get useful and valuable information in these large amounts of data has become a attractive topic in the world.Therefore,data mining become a very popular research field.As one of the most important research branch in the field of data mining,the association rule mining has been studied by many scholars around the world,but there are a lot of problems which need to be improved.As the most classic algorithm of association rule mining,Apriori algorithm can mine the frequent item sets of the transaction database effectively,and then according to the frequent item sets,it is quick and accurate to find the association rules.However,Apriori algorithm has two significant shortcomings,the algorithm need to scan database repeatedly and generate a large of useless candidate item sets.It is a great challenge for association rule mining technology because of the two drawbacks.In the paper,on the basis of the deep research and analysis of the Apriori algorithm,the main research work focus on the improvement of the Apriori algorithm,which is mainly reflected in the following aspects:1)Because the traditional Apriori algorithm has high cost of I/O and generate a large amounts of redundant candidate item sets,so the efficiency of the algorithm is very low.Therefore,a improved Apriori algorithm which named Mapping_Apriori and based on mapping structure has been proposed.Firstly,the improved algorithm storage the transaction database by the mapping structure which can compress the database effectively and reduce a large amounts of I/O burden.At the same time,It can calculate the support degree of candidate item sets quickly and reduce the complexity of computation through the mapping structure.Finally,according to the properties of the frequent item sets,it is so easy to achieve pre-slimming effect for the frequent item sets k-1L and avoid the generation of the useless candidate item sets.Through the experiments,the validity of the algorithm is verified and the operation efficiency is improved.2)Through in-depth study of the frequent pattern growing algorithm named FP-Growth algorithm,the algorithm can find the frequent item sets without the generation of candidate item sets.The core of the algorithm is the FP-tree which can compress the transaction database effectively.Based on the advantages of FP-tree,a3)new improved algorithm which named FP_Apriori algorithm has been put forward.The improved algorithm can make a projection between the transaction database and FP-tree.According to this idea,it is effective to avoid the enormous I/O overhead.At the same time,a improved search strategy on FP-tree is more pertinence that can reduce the running time of the algorithm.At last,the improved algorithm also decrease the size of frequent item sets in advance with the same theory of the Mapping_Apriori algorithm.Through the experiments,the efficiency of the Apriori algorithm,Mapping_Apriori algorithm and FP_Apriori algorithm has been analyzed in a contrast way,and some satisfied results has been emerged.4)For the inherent limitations of the traditional serial algorithm,the MapReduce is used to realize the parallel of the algorithm.In order to adapt the MapReduce better,a new improved algorithm named BM_Apriori is proposed and transplant the improved algorithm to the Hadoop platform.The parallel efficiency of the algorithm has been verified through the experiments.It can provide some useful reference for the processing of big data.
Keywords/Search Tags:Association Rule Mining, Apriori Algorithm, Mapping, FP-tree, Hadoop
PDF Full Text Request
Related items