Font Size: a A A

Research On Two Improved Density Peaks Clustering Algorithms

Posted on:2019-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:H Q YanFull Text:PDF
GTID:2348330566464604Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cluster analysis is a kind of unsupervised machine learning method which classifies data point under the condition of no data class information.It gathers samples into groups of similar samples according to some predefined similarity or dissimilarity measure.Its applications range from astronomy to bioinformatics,bibliometrics and information security.Clustering analysis has a rapid development from being proposed to getting wide attention and application,there are many new clustering algorithms,but these algorithms tend to only work for a particular dataset,it is difficult to have applied broadly.Moreover,most algorithms require some predefined parameters.Therefore,how to design an efficient clustering algorithm,which can deal with different distributed datasets and different dimensional datasets,is still a hot topic.Density-based clustering algorithms are different from most other types of clustering algorithms,they are not distance-based clustering algorithms and classify data points through series of information related to the density of each data point.Density-based clustering algorithms can find any shape or different distribution clusters and don't have the drawback that the algorithm only recognizes globular clusters.In recent years,there are many research about density-based clustering algorithms,the most representative is the density peak clustering algorithm DPC(Clustering by Fast Search and Find of Density Peaks),this algorithm have a simple model and a high accuracy clustering result.It has a few parameters which must be predefined and have many advantages like other density-based clustering algorithms.DPC algorithm is an excellent density-based clustering algorithm.In the article,authors have designed a heuristic method named decision graph to select cluster centroids,the user can manually select cluster centroids through the decision graph.The DPC algorithm generates clusters by assigning data points to the same cluster as its nearest neighbor with higher density after cluster centroids are selected by users.Although the authors have designed decision graph to solve the problem of selection of cluster centroids,but it still needs to manually select the threshold value.Improper threshold values will lead to poor clustering results.In addition,this simple threshold-based method is also difficult to identify cluster centroids which have low density values.Therefore,it is necessary to design an effective algorithm to automatically identify cluster centroids.In our experiment,we also found that the peak density clustering algorithm in dealing with complex datasets or datasets of high dimensions have poor results.The traditional gaussian kernel density estimation method will give some inaccurate estimation,which results poor clustering results.In order to solve the two above problems,two improved density peaks clustering algorithms are designed which include a new density peaks clustering algorithm based on an improved potential-based density estimation method and a new density peaks clustering algorithm using statistic-based method to select cluster centroids in decision graph automatically.The first method estimates density values through a double KNN method and potential value computing,the second method can identify centroids automatically through statistic analysis and interval estimation from the decision graph.Our methods are all simple and shown to be very effective in identifying different kinds of clusters.In this article,the two algorithms are described in full details and evaluated on some synthetic and real-world datasets.
Keywords/Search Tags:Cluster analysis, density estimation, cluster centroids identification, density peak clustering
PDF Full Text Request
Related items