Cluster analysis is one of the major techniques in data mining,and it has been widely applied in the field of artificial intelligence.With the diversity in both the definition of cluster and the strategy of clustering,lots of clustering algorithms appeared in the literature.In general,an algorithm which can group unlabeled data into several clusters can be called as clustering algorithm.Based on the difference in basic clustering ideas or basic clustering assumptions,the clustering algorithms can be divided into several branches: partitioning methods,hierarchical methods,density-based methods,grid-based methods,model-based methods,etc.This paper proposes a new branch of clustering algorithm,which is based on local centrality measures.Specifically:1)Proposed the concept “local centrality measure”(LCM)originally.The LCM is used to indicate how close a point is to its nearest local center.It is essential to estimate the LCM correctly for the purpose to distinguish data points in central cluster areas and border cluster areas.In the viewpoint of this paper,the density in the density-based clustering algorithms plays a role as LCM: data points with density larger than the predetermined threshold value are grouped as core points,while data points with density less than the predetermined threshold value are grouped as border points,and they connect with each other to output the final clustering results.Empirically,central cluster areas are with larger densities,and border cluster areas are with smaller densities.As a result,with a systematic mathematical theory,the density became the first widely used LCM.However,there exist some drawbacks when the density is served as LCM.Firstly,it is not easy to estimate the threshold density value without empirical knowledge,which will lead to a parameter-sensitive clustering algorithm.Secondly,different clusters may have different proper threshold density values.Consequently,clustering algorithms based on density may not handle imbalance data properly.In summary,it is still necessary to design new LCMs.2)Designing several new LCMs.The correctness of clustering results are related with the accuracy of the LCMs.Besides,there exist other properties of LCMs which should be satisfied: the stableness and the robustness.The stableness requires a stable range of threshold value and a low parameter insensitivity.The robustness requires the LCMs are not susceptible to imbalanced problems.This paper derives several LCMs for the purpose of stableness and robustness from the mean shift and the local gravitation model.3)The local gravitation model and new clustering strategies are proposed.Based on the diversity of the designed LCMs,this paper proposed new clustering algorithms called LGC and CLA.It is much easier to preset parameters and the clustering results are improved significantly.4)New nonparametric test technique for multiple validity index is proposed.There are several popular validity indexes for clustering results in the literature,for instances the RI,ARI,NMI,etc.It does not make sense when the NMI value of clustering algorithm A is compared with the ARI value of clustering algorithm B.This paper designed a new nonparametric technique for the comparison of different validity indexes by using ranks. |