Font Size: a A A

Research And Improvement Of Apriori Algorithm Based On Hadoop

Posted on:2017-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:J P ZhaoFull Text:PDF
GTID:2348330488470971Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and Internet,the amount of information in modern society grows very fast,these information accumulates a large number of data,including personal data and industrial data,these data have the characteristics of dynamic,heterogeneous and diverse.By 2025,more than 1/3 of the annual data information resides in the cloud platform,or with the cloud platform processing.In order to dig out valuable information from these huge amounts of data,we need to analyze and process these data.How to store and analyze these data efficiently?At this time,cloud computing with its unique posture boards the stage.Hadoop system, a relatively mature open source cloud computing framework,When dealing with these problems, using a distributed file system store these data,improving the literacy rate and expanding storage capacity,using MapReduce system to integrate data on a distributed file system to ensure the efficient analysis and data processing.Hadoop system also uses the storage redundancy mechanism to ensure the data security.Under such background,the thesis combines the traditional data mining system with the Hadoop framework,and improves the association rule model Apriori algorithm which is widely used in the data mining system.First of all,the thesis elaborates cloud computing,Hadoop Distributed File System(HDFS) as well as MapReduce parallel computing framework,designing data mining system architecture based on Hadoop,describing its responsibilities related to the main module.Then,the thesis analyzes and studies the assoication rule model Apriori algorithm,improving the classical Apriori algorithm by the idea of dividing the database and using the cloud computing,cloud platform Hadoop knowledge,proposing the inproved mining algorithm based on MapReduce.In this thesis,we design the improved algorithm based on MapReduce detailedly.At last,the simulation experiment is carried out by building Hadoop cluster.The results show that the improved Apriori algorithm based on MapReduce have a low time complexity when dealing with massive data,and with the increase of data size,the advantage is more obvious.
Keywords/Search Tags:Cloud Computing, Data Mining, Hadoop, Apriori Algorithm
PDF Full Text Request
Related items