Research And Implementation Of Data Mining Algorithms Based On Distributed Computing

Posted on:2017-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:D Qi

Full Text:PDF

GTID:2348330518495375

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the improvement of the convenience when accessing the Internet,the online activities of Internet have become an increasingly popular in emerging areas.With the rapid development of Internet,the Internet has become more and more extensive,therefore,the Internet has produced a lot of data from user.Traditional stand-alone computing method has been gradually difficult to meet the computing requirements and computing speed of the actual business scenarios in the Internet industry.However,the research of data mining algorithm based on distributed computing is helpful to deal with the increasing amount of data in the Internet.This requires people transform the theories of traditional single computing data mining algorithm to the distributed computing data mining algorithm.This method based on the single computing data mining algorithms,which are the most widely using today,including classification algorithms such as Naive Bayes and SVM,association rules such as FP-Growth,clustering algorithms such as Canopy,k-Means to research and implement data mining algorithm based on the distributed computing,and will be based on distributed naive Bayes algorithm and FP growth association rules for text classification and based on the application of clustering analysis of improved k-means algorithm in distributed environment in Microblogging hot spots analysis system.The main work of the paper list as the following:1.Research on the basic theory of data mining algorithm and the basic design idea of distributed computing,proposed the key research contents in this paper,distributed computing of data mining algorithm,namely classification algorithm,naive Bayes algorithm and SVM algorithm,association rules,FP growth and clustering algorithm,canopy,K-means,improved k-means clustering algorithm based on distributed computing;2.Based on the research content proposed above,this paper focus on the research of data mining algorithm based on the distributed environment.First of all,the method based on the research of the data mining algorithm,combined with MapReduce programming model in the distributed environment of Hadoop to implement the algorithm based on the distributed environment of the classification algorithms,naive Bayes and SVM,association rules FP growth and clustering algorithms,Canopy,K-means and the improved k-means clustering algorithm.According to the distributed computing of data mining algorithm,in view of the different distributed data mining algorithm to classical data sets of comparative experiments,analyzed the processing efficiency of the distributed computing data mining algorithm;3.Based on the experimental results and analysis of the data mining method in the distributed environment,this paper designs and implements a micro blog hot blog analysis system.Experiments show that this method can meet the basic function of each module in the micro blog analysis system,and verify the performance advantage of the distributed data mining algorithm compared with the performance of the single computing.This paper design and implement hot microblogging blog analysis system.Firstly,it combines the distributed data mining algorithm of the naive Bayes algorithm,association rules algorithm for micro blogging data of topic partition,and then combines with the data mining in a distributed environment is proposed in this paper,the improved k-means algorithm to carry on the micro blog hot post analysis results based on topic partition,finally according to blog analysis results of evaluation indicators for analysis.

Keywords/Search Tags:

distributed data mining, classification algorithm association rules, clustering algorithm, analysis of micro blog hot spots

PDF Full Text Request

Related items

1	Classification Association Rule Induction Algorithm And Applied Research
2	Research On Association Rules Mining In Data Streams And Its Application
3	The Research Of Application On Medical Data Process And Mining Algorithm Of Association Rules
4	Association Rules Mining And Its Applications In Microarray Gene Expression Data
5	Study Of Compensated Fast Distributed Mining Algorithm Of Association Rules
6	Research On The Optimization Of Association Rules
7	Kernel-based Adaptive Fuzzy C-means Clustering Algorithm Based On Fruit Fly Algorithm And Association Rule Mining
8	Research On Distributed Association Rules Min-Ing Algorithm And Its Applications
9	Data Mining Technology In The It Public Course Evaluation
10	The Applied Research Of Association Rules Mining Based On Colony Algorithm In Marketing