Font Size: a A A

Research And Implementation Of Parallel Data Mining Algorithms Based On Cloud Computing

Posted on:2019-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2428330566499200Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and internet technology,the amount of global data increases explosively.The data mining algorithm in traditional single machine mode is limited by its storage capacity and computing power,which can not meet the current needs of information processing.It can greatly improve the efficiency of data mining by parallelizing the existing data mining algorithm.In view of the above-mentioned application requirement,the components and operational mechanisms of the two mainstream cloud computing frameworks: Hadoop and Spark are introduced firstly in this paper.Moreover,the data storage mechanism of HDFS,the programming principles of Hadoop MapReduce and Spark are further researched.Then we briefly introduce the basic process of data mining,as well as the basic principles of C4.5 classification algorithm and KNN classification algorithm.Based on the MapReduce and Spark computing framework,parallelization strategies of C4.5 classification algorithm are proposed.In addition,considering the shortcoming of KNN classification algorithm,an improved KNN classification algorithm is presented,and the parallelization strategy of the improved KNN classification algorithm based on Spark computing framework is also implemented.Finally,a cloud computing platform is set up to compare the parallelization algorithms with the original algorithms.The testing results indicate that the data mining algorithms based on cloud computing can achieve great advantage on computing speed and excellent parallel speedup ratio in the aspect of big data analytics.The Spark computing framework is superior to MapReduce in iterative data mining.Compared with the original KNN classification algorithm,the classification efficiency of the improved KNN classification algorithm based on Spark has been enhanced greatly.
Keywords/Search Tags:Cloud Computing, Data Mining, Spark, Classification, KNN
PDF Full Text Request
Related items