Font Size: a A A

Research On A Parallel Data Mining Algorithm Apriori

Posted on:2018-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:R C LiFull Text:PDF
GTID:2348330515996694Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The value of data is to analyze data and then find the law behind it,it can be a good guide to our future work.With the advances of Internet and information technology,all walks of life have accumulated a large number of various types of data.The first batch of companies in our country to use big data are some large internet companies.They have a large number of customers,the behavior of these customers in the network generates large amounts of data.These companies can selectively deliver products and information to customers by analyzing customer spending habits or reading habits.The application of large data in the traditional industry is also very valuable.For example,the electric company through the data analysis can predict the line load,and then more accurate optimization of energy reserves and deployment.Traditional manufacturing is based on the use of data feedback to develop next-generation product development programs.In summary,the use of data analysis to guide future work has become a trend of development.So the effective use of data,mining the data behind the law becomes particularly important.Data mining technology in this context came into being.Data mining is divided into six categories.Which is the association algorithm,the classification algorithm,the regression algorithm,the clustering algorithm,the prediction algorithm and the diagnosis algorithm.This paper mainly introduces the association algorithm.One of the classical algorithms for association rule mining is the Apriori algorithm.The algorithm can accurately dig out the interrelated items in the data.The typical problem is the issue of goods placed in the supermarket,the goods that are always bought together will placed together.The initial algorithmic design of the data size is not considered very full,in dealing with large data sets may be relatively inefficient.So the idea of this paper is to optimize the Apriori algorithm to a certain extent,and through Map Reduce algorithm will be transplanted to the hadoop platform.Then the traditional Apriori algorithm becomes a distributed algorithm.You can assign tasks and data to the cluster,improvethe efficiency of data mining.Hadoop platform is a kind of cloud computing platform.The advantage of hadoop is that you can use a lot of cheap,non-highly reliable hardware to store and process data,and can be very convenient to use its programming model to change some of the serial algorithm into a concurrent implementation.This article will show the background knowledge of hadoop and association algorithms,and discuss the feasibility of implementing the apriori algorithm through the mapreduce programming framework and deploying it on the hadoop platform.Demonstrate the effect of this approach on efficiency.Hope for future researchers in the algorithm transplant cloud platform has some reference.
Keywords/Search Tags:Data Mining, Association Rules, Hadoop, Mapreduce Programming Framework, Apriori Algorithm
PDF Full Text Request
Related items