Font Size: a A A

A Improved Density Peaks Clustering Algorithm

Posted on:2019-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2428330596463198Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In a great number of the absence of tag data sets,clustering plays an important role in finding the basic structure and characteristics of data sets.Furthermore,many algorithms are able to identify the data set of the irregu lar shape cluster,such as density based on grid algorithm.These algorithms can get very good clustering effect after setting the appropriate parameters,but these parameters are not easy to get.A.Laio proposes an algorithm called DP,whose main advantag e is to find the point where the density and distance are large very quickly through the decision diagram and the point as the center,and by setting the truncation percentage to reduce the influence of parameters on the clustering effect.Among them,ther e are three main problems.Firstly,the candidate center points to be found need to be determined by human judgment again,which may lead to wrong center point;Secondly,the distribution point to which cluster depends only on the nearest neighbor whose density is higher than that of the cluster,which will lead to misclassification as well as unclear clustering boundary;Thirdly,the truncation percentage is difficult to choose properly(the performance of different distributed data sets varies greatly wit h the same truncation percentage).These will be explained in detail below.In order to solve these problems,an improved DP algorithm based on hierarchical method is proposed.This is a top-down hierarchical cluster algorithm whose principal principle is to use less density between two centers to split clusters for the purpose of finding the center point.This algorithm has a advantages: first,to be able to automatically determine the center and can very accurately partition clustering boundaries,and can well solve the problem of uneven distribution,able to adapt to more complex data sets;Second,it greatly reduces the sensitivity of the set of truncation percentage in the original DP algorithm.In the process of clustering again,it only need to combine the clusters according to the connectivity between points.In addition,the algorithms proposed herein do have good performance over several data sets...
Keywords/Search Tags:Clustering, Density-Based Clustering, Hierarchical clustering, Data mining
PDF Full Text Request
Related items