Font Size: a A A

The Study Of The Improvement And Transplantation Of Apriori Algorithm Based On Hadoop

Posted on:2013-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:A Z ZhuFull Text:PDF
GTID:2248330392957220Subject:Information Science
Abstract/Summary:PDF Full Text Request
PurposesWith the rapid development of computer technology and the Internet, the mature andwidely used of Web2.0, data’growth appears explosively. The traditional data miningalgorithms are inefficient when dealing with huge amounts of data, the emergence of cloudcomputing bringing a new way for its improvement. Through the power of clusting, cloudcomputing realizes reliable storage and high-speed computing for massive data. Hadoop asa mature open source cloud computing framework, with its highly efficient, scalable,low-cost advantages has been widely used in data mining related areas. Based on this, thispaper integrates the Hadoop and the typical data mining system,and selects Apriorialgorithm which is in the algorithm modul of the new data mining system and used widelyto improve, willing enhance its efficiency when dealing massive data.MethodsThe reach methods used in this paper include: documentary research, structuredapproach, case study method and comparative anaylysis. Documentary research can help usunderstand the current situation of related research and provide a theoretical reference forthis paper’research. Structured approach is a commonly used method for system analysis,which is being of guiding significance to analysis the system architecture of cloud datamining which is based on hadoop. This paper describes the implementation process oftraditional apriori algorithm and the feasibility of the improvement algorithm throughinstance. Through comparative anaylysis, this paper analyses the advantages ofimprovement algorithom.Results(1)Combing with the tipical data mining system architecture and integrating withHadoop, the paper brings up the data mining system architecture based on Hadoop, andexpouds the each functional modules briefly.(2)On the basis of elaborating Apriori algorithm, aming at its bottleneck when dealingwith massive data, using MapReduce programming modul, the paper presents the idea of parallel improvement based on the partition of database. Through descripting in detail anddesigning the improved algorithm, with example the paper demonstrates the feasibility ofthe improved algorithm and analyse it.(3)Through the case study, the improvement algorithm is more efficient, whichreduces the time complexity and space complexity.Conclusions(1)Cloud computing broughts new ways for data mining algorithms, and cloud datamining will become the future research trend.(2)This paper has a certain significance, which provides a reference for otheralgorithms’ improvement of data mining. More and more algorithms to be parallelized andtransplant to the Hadoop cloud data mining platform.
Keywords/Search Tags:Hadoop, AprioriAlgorithm, MapReduce, Association rules, Cloud Computing
PDF Full Text Request
Related items