Font Size: a A A

An Improved Method Of Apriori Algorithm Based On Hadoop

Posted on:2017-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2308330482495689Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the society and the progress of science and technology, the Internet computer technology has be penetrated into all aspects of people’s life.Using Internet and computers leads to more and more data and information, so dose the problem of how to use and store the data. After relevant research and experiment, it is found that a lot of useful knowledge and information often hide behind a lot of data, and this phenomenon leads to the birth of data mining. Data mining techniques can be generated from the people’s life in all aspects and find some meaningful data rules, it also can find objective laws from those rules.In technologies that covered by data mining, Apriori algorithm is the basic algorithm for mining association rules, Apriori algorithm use the iteration step to generate k+1 frequent itemsets from k frequent itemsets,until there is no new frequent itemsets can be produced.Apriori algorithm can accurately dug up some things that are associated in database projects, such as goods that alwayt be purchased together, packages that telephone customers always use together, physical examination project and drugs that patients always use together, etc. Although Apriori algorithm is of great significance for the discovery of association rules, with the explosion of data volumes, Apriori algorithm in practice still exist the shortcomings of low efficiency, so it is hoped that the algorithm can be achieved by some improvements or transplantation to avoid or reduce some unnecessary and time-consuming work, so as to enhance the efficiency of the algorithm.Due to one computer is insufficient in dealing with large scale data and other problems,and at the same time the value of big data is more and more important, cloud computing,the powerful tools to process large data problem, is produced. Cloud computing, distributed computing platform,can provide large data with strong computing power and large space, and make some very complicated and time-consuming procedures can be achieved by parallel computing and become more safely and quickly. With the maturing of cloud computing technology, people tend to transplant applications of processing big data to cloud computing platform and combined cloud computing with algorithm for different problems to make their biggest advantage into full play.This paper introduces the background knowledge of data mining and cloud computing,analyzes the working process and characteristics of Apriori algorithm, and proposes a improved Apriori algorithm that can be transplanted to the Hadoop platform, at the same time,this paper illustrate and validate the feasibility of the improved algorithm,and offers other researchers the way how to improved Apriori algorithm.
Keywords/Search Tags:Data mining, Association rules, Frequent itemsets, Hadoop platform, Graphs framework
PDF Full Text Request
Related items