Research On Cluster Analysis Bbased On Open-source Cloud Computing Platform With Hadoop

Posted on:2016-08-14

Degree:Master

Type:Thesis

Country:China

Candidate:J P Ren

Full Text:PDF

GTID:2298330452971215

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the wide application of data collection tools and the rapid development of theInternet. when the traditional clustering algorithms deal with big data, it is difficult toachieve the requirements. The cloud computing platform emerges as required, it evolvesfrom parallel computing. The cloud computing applications with distributed, heterogeneousand other features are suitable for large data processing.The reaserch of improved algorithms mainly include concepts of data field, grid,increment, parallel and mapreduce based traditional clustering methods, which is the mostwidely used to improve the efficiency of clustering algorithms based mapreduce model; Asdata volumes increase, processing large data based cloud computing platform has become ahot spot.The data mining algorithms research gradually become a hot topic based on cloudcomputing platform. For now though mainly including: To reaserch on parallel algorithmsof the general rules, to find the relationship among the data size、the algorithm complexityand nodes, and to find speedup and scalability factors, finally to design the efficient parallelclustering algorithms.Three new algorithms are proposed based cloud computing platform in this paper.(1)For processing massive data, a MapReduce based triangle inequality canopy K-meansalgorithm is proposed. The algorithm takes advantageofthetheoryof thetriangleinequality,reduces the computational redundancy and operation time. The experiments demonstratethat the algorithm reduces the I/O and network transmission of consumption, and overcomesthe shortage of local optimum, so it can effectively process big data based on MapReduceframework.(3) For processing irregularly distributed massive data, a MapReduce basedhierarchical clustering algorithm is proposed. The algorithm combines the theory of MeanShift for preprocessing massive data, and takes advantage of the CURE algorithm toimplement a MapReduce based MS-CURE algorithm. The experiments demonstrate that thealgorithm achieved efficiency and timeliness of a trade-off and better clustering results.(3)Aiming at disadvantages of traditional clustering algorithm to parameter’s sensitive, hightime-complexity and static data. A dynamic and incremental clustering algorithm usingreferences and density is proposed. The creativity of DICURD realizes a dynamic and incremental clustering algorithm based cloud computing. The experimental resultsdemonstrate that the algorithm decreases the parameter of sensitive, and improves efficiencyand resource utilization, is suitable to analyze big data.

Keywords/Search Tags:

Big data, Triangle inequality, Mean shift, Dynamic clutering, MapReduce

PDF Full Text Request

Related items

1	Research Of Image Clustering Based On Local Structure Constraints
2	Using The Triangle Inequality To Accelerate Cluster Algorithm
3	The Research And Implementation Of Comprehensive Mapreduce
4	Research On Data Cleaning Based On Clustering Algorithm
5	Research Of Dynamic Skyline Query Processing Approach In MapReduce
6	Load Balancing Algorithm Based On Data Skew Of MapReduce
7	Research On Optimized Modulation Technology Of Arbitrary Topological Triangle Meshes
8	Research On Performance Optimization Of MapReduce Model
9	Research On The Parallel Segmentation Algorithms Of Medical Image Based On Mapreduce
10	Research On Clustering Algorithms Of Location Big Data Based On MapReduce