The Research And Implementation Of Association Rules Algorithms-Apriori Based On Cloud Computing

With the rapid development and popularization of the information about media, communication and all kinds of data emerged in large numbers. And these data usuall y are impure, tremendous, not used and computed directly. At the same time,the data mining problems about the big data have not been resolved until the appearance of clo ud computing technology puts forward to cracking method; By the way of analyzing and researching the algorithm about the cloud computing and data mining carefully, e stablish the research direction and target about the thesis.This text using the trucked mode of Hadoop to configure the environment,and us ing the Eclipse development tool and Java development language to complete the wh ole experiment; At the same time,on the basis of Apriori algorithm,I researched the new algorithm and compared the old one with the new ones by testing different index s and real records to go on with the test experiment. In the first place, by realizing an d analyzing the development tendency and present situation at home and abroad to est ablish the research background and meaning about clouding computation and data mi ning. In the second place, introduce amply the most important parts of Hadoop:HDF S and MapReduce. Then, expound stressly two different improved algorithms by divi ding two chapters. The first algorithm puts to use the Apriori algorithm based on the power of number and support counting scheme to mix the existed improved algorithm with support counting technology, then named the improved algorithm as RDBSC_Apriori algorithm. The second algorithm puts to use the Apriori algorithm based on matrix and scissors_branch step to arrange them together to be another new improved algorithm named PMBSC_Apriori algorithm. At last, we go on with the experiment under the built environment and the experiment mainly was directed against the size of the data set, the number of panel and the speed-up ratio to obtain the result of expe riment.By the way of analyzing the experiment data, proving that the feasibility and ac curacy of the improved algorithm. At the same time, the whole system ran normally and all the kinds of indexes also meet the setting standard, but the experiment also ex ists the careless mistakes and deficiencies, so it is still essential to keep trying to expl ore and research the algorithm.
Keywords/Search Tags:Data mining, Association rules, Power set, Matrix, Apriori algorithm
