Design And Implementation Of Distributed Clustering Framework Based On Model Fusion

Posted on:2013-09-30

Degree:Master

Type:Thesis

Country:China

Candidate:J Q Li

Full Text:PDF

GTID:2268330392970761

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet and social networks, big data analysis andmining has become a universally recognized problem. Clustering as a classic means ofdata mining, has to be imporved based on the fundation of distributed architecture inorder to solve the problem of large data calculation,so as to adapt to the realities of thecurrent situation which is full of dispersed and mass data. Course Distributedclustering has become a hot issue in the field of academic research, there are stillendless algorithm improvement. Most of the current distributed clustering algorithmneed communication between nodes, flooded with a lot of redundant data network,while the lack of a central node algorithm to co-ordinate the overall situation; Theway of establish a central node, the central node only play the role of transfering dataand can not be fully play its role because of the impacting of the distribution of dataand quality of data. The article combines these two ways to each other, not onlyestablishing the central node, but to describe the data by the way of the data densitydistribution. Thus reducing the network data transmission, but also to avoid theimpact of the data are unevenly distributed clustering algorithm.Existing distributed clustering algorithm still has several major issues as follows: thedistribution of data, data quality and other factors impact on the results ofclustering,serious lack of global data description in the calculation process, calculatedinefficient transmit large amounts of redundant data.We will conduct improvementsagainst these points.Due to the presence of the above problems, first we use the "one to multi" mode, theone central node which in charge of takeing the whole situation into account,acceptingand transferring the data, each sub-node data in charge of calculating and reportingdata. So as to reducing the waste of resources transfering data between one node toeach other. And then to describe the distribution of the data through the data density,thereby reducing the impaction of data distribution and data quality of the clustering.According the purpose of algorithm designe,we use classic k-means algorithm as the framework of the basic algorithm and adopt the map/reduce of hadoop, hdfs toachieve iteration and data storage and data transmission.According to the analysis of experimental results, the framework of the proposedalgorithm to some extent reduce the calculated time and improve the accuracy of theclustering.

Keywords/Search Tags:

distributed clustering, data density described, hadoop, cluster analysis, k-means

PDF Full Text Request

Related items

1	Design And Implementation Of Distributed Text Clustering System Based On K-means
2	Research On Parallelization Of Text Clustering Based On Hadoop
3	Research On Machine Learning Clustering Algorithms In The Hadoop Development Environment
4	Clustering Analysis Based On Hadoop
5	Chroma Clustering Analysis Of Film Poster Based On Hadoop
6	Design And Its Implementation Of Iterative Distributed Clustering Framework Based On Model Fusion
7	A Research And Implementation With Improved K-Means Clustering Algorithm To Image Retrieval System Based On Hadoop Platform
8	Research On Two Improved Density Peaks Clustering Algorithms
9	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
10	Research And Application Of Clustering Algorithms In The Analysis Of The Behavior Of Campus Wireless Network User