Font Size: a A A

Research And Improvement Of Clustering Algorithm

Posted on:2018-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2348330518993321Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important technology in data mining. Its purpose is to divide data into clusters according to their similarity, which has been widely used in many fields. Based on the study of classical clustering algorithm, this paper focuses on a density peaks based clustering algorithm(DPC). Firstly, its principle, performance, advantages and disadvantages are analyzed. Then, the radius and noise filtering part of the algorithm are optimized and improved. At last, using the idea of fusion of clustering, DPC algorithm is combined with classic clustering algorithms, and two new algorithms are proposed. The main work of this paper is as follows:1.A method of automatically acquiring radius is designed. In the DPC algorithm,the user needs to manually enter the radius dc, the value of d, affects the performance of the algorithm. In response to this problem,this paper designs a method to automatically select the radius depending on density feature of cluster extreme point. The results in the test dataset show that the algorithm has better clustering performance and improves the robustness of the algorithm.2.Several noise filtering methods are optimized and designed. In addition, the DPC algorithm filters the noise through the boundary density, and determines all the data points in the clusters whose density is less than the boundary density as the noise, where the boundary density is the maximum of the mean density of intersecting data points of different clusters. However, some data points are mistakenly identified as noise.To solve this problem, this paper optimizes the boundary density and designs several noise filtering methods based on density and high density minimum distance, then analyses and compares them. The test results of the dataset show that several methods are more accurate than the original method and improve the anti-noise ability of the algorithm.3.Combining DPC algorithm with K-means algorithm, a DP-Kmeans algorithm is proposed. DP-Kmeans filters the noise by the method designed in the previous paper to eliminate the interference of the isolated points to the clustering, and improve the anti-noise ability of the algorithm. And then select the largest k points with the high density minimum distance as the initial clustering centers of K-means algorithm which improves the efficiency of clustering by solving the drawback that the K-means algorithm chooses the initial clustering center randomly and leads the clustering result to be unstable and easily fall into the local optimal solution.4.Combining DPC algorithm with DB SCAN algorithm, a DP-DBSCAN algorithm is proposed. DP-DBSCAN is divided into high /low density centers according to the density of cluster center, clustering by density descending method of DPC/DBSCAN algorithm for high/low density centers. DP-DBSCAN improves the efficiency of clustering by solving the problem that the clustering result is poor when the cluster density is relatively uniform which caused by the data distribution of DPC algorithm depending on the density descending order. Compared with DB SCAN, the algorithm is more robust and the clustering result is better.At last, algorithms are realized by Matlab. In the experiment, the performance of algorithms is verified by artificial and UCI standard dataset. The results show that the clustering result of DP-DBSCAN is better than that of DPC and DBSCAN, and it is more robust than DBSCAN. The clustering result of DP-Kmeans algorithm is better than K-means algorithm, the stability is stronger and the anti-noise ability is better. In the spherical dataset, its efficiency of clustering is better than DP-DBSCAN and DPC algorithm.
Keywords/Search Tags:clustering analysis, density peak, fusion of clustering
PDF Full Text Request
Related items