Font Size: a A A

Improvement Of Density Peak Clustering Algorithm And Its Customer Segmentation Application

Posted on:2021-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ChenFull Text:PDF
GTID:2518306248967729Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Density Peak Clustering(DPC)is published in Science in 2014.DPC is a clustering algorithm based on density.Compared with other clustering algorithms,DPC is able to recognize like-globular data sets and detect clusters of different sizes and densities.However,DPC still has some problems:(1)there are some shortcomings in aggregation and distribution,especially for data sets with large density differences and irregular shapes;(2)it is difficult to choose cluster centers correctly in decision graph;(3)it is difficult to identify outliers.In 2018,combined with gravity thinking,Hao et al proposed an improved density peak clustering algorithm,named Gravitation-based Density Peaks Clustering(GDPC),solved the two problems that cluster centers are not obvious in decision graph of DPC algorithm and it is difficult to identify outliers when processing some data sets.The GDPC algorithm uses gravity theory to improve the DPC algorithm,which can accurately detect the cluster centers and outliers.However,like the DPC algorithm,GDPC still has the same problems in the aggregation process: the distribution process is not reasonable,the aggregation performance is not good,and it is particularly difficult to aggregate data sets with large differences in densities and irregular shapes.To effectively solve the shortcomings of the distribution process in the DPC algorithm and the GDPC algorithm,this paper proposes two improved algorithms based on the advantages of the original DPC algorithm and GDPC algorithm respectively,specifically:Improved algorithm 1: The improved density peaks clustering algorithm is based on k nearest neighbors for improving assignment process(DPC-KNN).The proposed DPC-KNN integrates the idea of k nearest neighbors into the distance computation and assignment process,which is more reasonable.Therefore,DPC-KNN is more efficient than the original DPC algorithm when it processes some non-spherical data sets such as Spiral data set.Improved algorithm 2: The improved density peaks clustering algorithm is based on logistic distribution and gravitation(DPC-LG).DPC-LG adopts the probability density function of logistic distribution to improve the local density parameter of GDPC algorithm,which can optimize aggregation performance.DPC-LG can reasonably adjust the order of local density,thus effectively identify data sets of different densities and irregular shapes,getting good clustering results.However,DPC-LG algorithm is difficult to manually select cluster centers on some data sets,and it is easy to select more than one or miss one.It needs research on automatic cluster center selection.Improved algorithm 3: A density peak clustering algorithm based on logistic distribution and gravitation to automatically determine cluster centers(ADPC-LG).Combining with the normal distribution and other statistical knowledge,this algorithm sets the screening conditions for cluster centers,and it achieves automatic selection of cluster centers.According to the testing results of UCI data sets,ADPC-LG can handle different shaped data sets,and achieve automatic selection of cluster centers with higher accuracy.Finally,in order to improve the accuracy of customer segmentation(clustering)and reduce customer management costs,this paper selects the "Global Superstore" consumption data sample,combines the basic principles of the RFM model,selects the relevant attribute variables,and applies three improved DPC algorithms for clustering processing.The results show that DPC-KNN algorithm achieves good clustering results and it is more suitable than other two algorithms for this data set.Clustering results can provide help for enterprises to formulate marketing strategies.
Keywords/Search Tags:Density peak algorithm, Universal gravitational law, K-nearest neighbor, Logistic distribution, Cluster center
PDF Full Text Request
Related items