Font Size: a A A

Research And Implementation Of Data Mining Algorithms Based On Distributed Computing

Posted on:2017-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:D QiFull Text:PDF
GTID:2348330518495375Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the improvement of the convenience when accessing the Internet,the online activities of Internet have become an increasingly popular in emerging areas.With the rapid development of Internet,the Internet has become more and more extensive,therefore,the Internet has produced a lot of data from user.Traditional stand-alone computing method has been gradually difficult to meet the computing requirements and computing speed of the actual business scenarios in the Internet industry.However,the research of data mining algorithm based on distributed computing is helpful to deal with the increasing amount of data in the Internet.This requires people transform the theories of traditional single computing data mining algorithm to the distributed computing data mining algorithm.This method based on the single computing data mining algorithms,which are the most widely using today,including classification algorithms such as Naive Bayes and SVM,association rules such as FP-Growth,clustering algorithms such as Canopy,k-Means to research and implement data mining algorithm based on the distributed computing,and will be based on distributed naive Bayes algorithm and FP growth association rules for text classification and based on the application of clustering analysis of improved k-means algorithm in distributed environment in Microblogging hot spots analysis system.The main work of the paper list as the following:1.Research on the basic theory of data mining algorithm and the basic design idea of distributed computing,proposed the key research contents in this paper,distributed computing of data mining algorithm,namely classification algorithm,naive Bayes algorithm and SVM algorithm,association rules,FP growth and clustering algorithm,canopy,K-means,improved k-means clustering algorithm based on distributed computing;2.Based on the research content proposed above,this paper focus on the research of data mining algorithm based on the distributed environment.First of all,the method based on the research of the data mining algorithm,combined with MapReduce programming model in the distributed environment of Hadoop to implement the algorithm based on the distributed environment of the classification algorithms,naive Bayes and SVM,association rules FP growth and clustering algorithms,Canopy,K-means and the improved k-means clustering algorithm.According to the distributed computing of data mining algorithm,in view of the different distributed data mining algorithm to classical data sets of comparative experiments,analyzed the processing efficiency of the distributed computing data mining algorithm;3.Based on the experimental results and analysis of the data mining method in the distributed environment,this paper designs and implements a micro blog hot blog analysis system.Experiments show that this method can meet the basic function of each module in the micro blog analysis system,and verify the performance advantage of the distributed data mining algorithm compared with the performance of the single computing.This paper design and implement hot microblogging blog analysis system.Firstly,it combines the distributed data mining algorithm of the naive Bayes algorithm,association rules algorithm for micro blogging data of topic partition,and then combines with the data mining in a distributed environment is proposed in this paper,the improved k-means algorithm to carry on the micro blog hot post analysis results based on topic partition,finally according to blog analysis results of evaluation indicators for analysis.
Keywords/Search Tags:distributed data mining, classification algorithm association rules, clustering algorithm, analysis of micro blog hot spots
PDF Full Text Request
Related items