Density-based Statistical Merging Clustering Algorithm

Posted on:2017-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:B B Liu

Full Text:PDF

GTID:2348330503495645

Subject:Applied Mathematics

Abstract/Summary:

In recent years, with the rapid development of the national economy and the wide application of network technology, data source is constantly expanding, the size of data sets is gradually increasing, and data structures are becoming increasingly complex, how to get useful information from large-scale data with complex structure becomes the current research focus.As an important data analysis technique in the field of data mining, cluster analysis has a wide range of applications in pattern recognition, information processing, machine learning, and so on. Due to the uniqueness of initial conditions and clustering criteria, a variety of clustering algorithms are emerged. However, in the face of large data sets which have inter-class similarity, intra-class difference, noise and overlap issues, the limitations of existing clustering algorithms are becoming more and more obvious.For the ability of traditional clustering algorithm to deal with noise and overlap is poor, the paper is used to propose a density-based statistical merging clustering algorithm(DSM) from a statistical point of view. The algorithm innovatively takes each feature of data points as a set of independent random variable, and gets statistical criteria from the independent bounded difference inequality, Meanwhile, combined with the density information of data points, the DSM algorithm takes the descending order of the density as the merging order in the process of condensation, and achieves the statistical merging of date points belonging to different types. The experimental results of artificial datasets and real datasets show that, the DSM algorithm can not only deal with convex data set, but also have good clustering effect on data set of non convex shape, overlapping and noisy. This fully proves that the algorithm has good applicability and validity.To tackle the failure of traditional clustering algorithms in dealing with large-scale data, the paper proposes a density-based statistical merging algorithm for large data sets(DSML) from the point of view of data sampling. This algorithm is a generalization of the DSM algorithm in the application area. Firstly, DSML obtains a new sampling algorithm(Statistical Leaders algorithm) by improving Leaders algorithm with the statistical merger criteria; Secondly, combined with the Statistical Leaders algorithm and DSM algorithm, DSML completes the clustering of the whole data set. Theoretical analysis and experimental results show that, DSML algorithm can obtain a more representative sample set, has nearly linear time complexity, can handle arbitrary data sets, and is insensitive to noise data, which are very helpful to deal with large-scale data sets.

Keywords/Search Tags:

clustering, density, random variable, statistical merging, sampling, leader

Related items

1	An Improved Affinity Propagation Clustering Algorithm For Reducing Complexity
2	Parallel Magnetic Resonance Imaging Based On Variable Density Random Sampling Algorithm
3	Research On Clustering Algorithm Based On Density Analysis
4	Density Peak Clustering Study Based On Bayesian And Statistical Strategies
5	Density Clustering Analysis Algorithm Based On Variable Neighbor And Adaptive Density
6	Research On Application And Optimization Of Density Peak Clustering
7	Theory And Practice Of Hybrid Clustering Algorithm Based On Density And Ant Colony
8	Research Of Clustering Algorithm Based On Density Peak
9	A Research On Synthetic Aperture Radar Image Change Detection Methods Based On Segmentation Of Statistical Region Merging
10	A Fast Clustering Algorithm Based On Local Density And Framework Distance Between Clusters