Research Of Distributed Clustering Algorithm Based On Density

Posted on:2013-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:R Mao

Full Text:PDF

GTID:2248330371485121

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Research of Distributed Clustering Algorithm Based on DensityThe traditional clustering analysis methods and applications are mostly based on the centralized data set, that is, the data which need to be analyzed, is stored in the same computer or the same location. With the rapid development of information technology and network technology, currently, the existing form of large-scale application system and vast data set is mainly distributed. The using of traditional clustering analysis methods need to centralize these distributed storage data set, this will have a serious impact on efficiency of clustering analysis and data security, and will consume large amounts of network and storage resources.Distributed clustering is an important subject of distributed data mining, it realizes the clustering analysis of massive distributed data, solves the problem that the traditional clustering methods can not be applied in the distributed environment, realize the parallel computing of large-scale data, ensure the data security, make the clustering analysis methods no longer to be limited by data scale, security, privacy and other constraints. Distribute clustering is the inevitable development trend of the current clustering analysis.Density based distributed clustering method (DBDC) combines the DBSCAN algorithm with distributed environment, it solves the problem that the density based clustering methods can not be applied in the distributed environment and has better clustering accuracy, but it uses the DBSCAN algorithm twice in the local clustering level and the global clustering level, and the time complexity of the whole clustering process is in a high degree, besides, it needs to reselect a global clustering radius. The selection of the global clustering radius has a impact on the global clustering result, for high global clustering radius value, we run the risk of merging clusters which do not belong together. On the other hand, if we use small value, we might not be able to detect clusters which belong together, so it is much more difficult to find a suitable global clustering radius value.For solving the above problem, we proposed an improved density based distributed clustering method, the improved method uses a data grid mapping method which maps data object to the space grid first in the level, the data grid mapping method will reduce the searching space of the DBSCAN algorithm to improve the efficiency of the implementation of the local clustering. In the global clustering level of the improved method, we proposed a global clustering method based on representative points intersect. This method takes full advantage of the properties of the representative points of the local model, and uses the central point of representative point to reduce the clustering error, besides, it dose not need to input the cluster parameters in the global clustering level and thereby enhances the practicability of the improved method in distributed clustering analysis. Experimental results showed that the proposed method of the global clustering is more accurate and efficient than DBDC.

Keywords/Search Tags:

Clustering, Distributed Clustering, Representative Point, Intersect

PDF Full Text Request

Related items

1	Research And Application Of Density Peak Clustering Algorithm Based On Natural Neighbors And Representative Points
2	Based Clustering Algorithm And Its Application To Obtain A Representative Point
3	Study On New Data And Text Clustering Methods Based On Representatives
4	Prototype Selection Algorithm Based On Improved Cure Clustering And Application
5	Research Of Data Stream Clustering Methods Based On Density
6	Distributed Clustering And Evolutionary Clustering Algorithm Based On Semi-supervised Learning
7	Research On Density-based Distributed Clustering Algorithms
8	Research On Clustering Algorithm Based On Distributed Platform
9	Representative Image Selection In Large-scale Image Dataset
10	Parallel Clustering Algorithm Based On MapReduce