Font Size: a A A

Research On Distributed Clustering Algorithm Based On Spark And Implementation On Social Media Analysis

Posted on:2019-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:X BianFull Text:PDF
GTID:2348330542498760Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of big data era,the traditional machine learning algorithm is facing challenges.It is not enough to rely solely on Moore's Law for the large scale of data calculations.The development of cloud computing and the development of distributed platforms make it possible to put all data into analysis.Social media has become an important source of big data,and this data contains many valuable information about users.This paper designs and implements parallel clustering algorithm based on Spark parallel framework,and applies parallel clustering algorithm to the analysis of Weibo data to achieve the clustering recommendation of Weibo.This paper has follow parts:First of all,a brief introduction of the clustering algorithm and the clustering algorithms which are closely related to the algorithm in this paper.And introduced the distributed computing framework Spark.Then introduced principle,design detail and improvement on three distributed clustering algorithms:distributed CLARA algorithm,distributed DisAP algorithm,distributed p-CLOPE algorithm.The effectiveness of the distributed clustering algorithm and the parallelized speedup experiment are verified.The effectiveness of the algorithm and the acceleration effect of Spark platform are validated.Then based on the three clustering algorithms we build a prototype system for weibo data clustering analysis.The system architecture,design and implementation of the original system are introduced.We introducd the steps before the cluster calculation:data acquisition,data preprocessing,features extraction and then clustering.Afterwards,the system is used to carry out the cluster analysis of Weibo data.Finally,we introduced the BDAP big data mining platform and its components including platform structure,component integration mechanism.The integration of clustering algorithm is explained in detail,including the integration of each module,module design and implementation,and finally standardized integration into the system.
Keywords/Search Tags:clustering, Spark, distributed system, weibo analysis
PDF Full Text Request
Related items