Research On Clustering Data By Local Centrality Measures

Posted on:2019-10-21

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Q Wang

Full Text:PDF

GTID:1368330566987167

Subject:Computer Science and Technology

Abstract/Summary:

Cluster analysis is one of the major techniques in data mining,and it has been widely applied in the field of artificial intelligence.With the diversity in both the definition of cluster and the strategy of clustering,lots of clustering algorithms appeared in the literature.In general,an algorithm which can group unlabeled data into several clusters can be called as clustering algorithm.Based on the difference in basic clustering ideas or basic clustering assumptions,the clustering algorithms can be divided into several branches: partitioning methods,hierarchical methods,density-based methods,grid-based methods,model-based methods,etc.This paper proposes a new branch of clustering algorithm,which is based on local centrality measures.Specifically:1)Proposed the concept “local centrality measure”(LCM)originally.The LCM is used to indicate how close a point is to its nearest local center.It is essential to estimate the LCM correctly for the purpose to distinguish data points in central cluster areas and border cluster areas.In the viewpoint of this paper,the density in the density-based clustering algorithms plays a role as LCM: data points with density larger than the predetermined threshold value are grouped as core points,while data points with density less than the predetermined threshold value are grouped as border points,and they connect with each other to output the final clustering results.Empirically,central cluster areas are with larger densities,and border cluster areas are with smaller densities.As a result,with a systematic mathematical theory,the density became the first widely used LCM.However,there exist some drawbacks when the density is served as LCM.Firstly,it is not easy to estimate the threshold density value without empirical knowledge,which will lead to a parameter-sensitive clustering algorithm.Secondly,different clusters may have different proper threshold density values.Consequently,clustering algorithms based on density may not handle imbalance data properly.In summary,it is still necessary to design new LCMs.2)Designing several new LCMs.The correctness of clustering results are related with the accuracy of the LCMs.Besides,there exist other properties of LCMs which should be satisfied: the stableness and the robustness.The stableness requires a stable range of threshold value and a low parameter insensitivity.The robustness requires the LCMs are not susceptible to imbalanced problems.This paper derives several LCMs for the purpose of stableness and robustness from the mean shift and the local gravitation model.3)The local gravitation model and new clustering strategies are proposed.Based on the diversity of the designed LCMs,this paper proposed new clustering algorithms called LGC and CLA.It is much easier to preset parameters and the clustering results are improved significantly.4)New nonparametric test technique for multiple validity index is proposed.There are several popular validity indexes for clustering results in the literature,for instances the RI,ARI,NMI,etc.It does not make sense when the NMI value of clustering algorithm A is compared with the ARI value of clustering algorithm B.This paper designed a new nonparametric technique for the comparison of different validity indexes by using ranks.

Keywords/Search Tags:

Local gravitation model, Density-based clustering algorithm, Local centrality measure, LCM clustering algorithm, LGC clustering algorithm, CLA clustering algorithm

Related items

1	Improved Affinity Propagation Clustering Algorithm Based On Multiple Theories And Its Applications
2	Research Of Clustering Algorithm Based On Data Local Distribution
3	Research Of Density-based Clustering Algorithm By KNN
4	Research On Local Density Clustering Algorithm
5	Research On Clustering Algorithm Based On Automatic Determination Of Class Number Technology
6	Clustering Algorithm In Data Mining Research
7	Research And Improvement On Density-Based Clustering Algorithm
8	A Fast Clustering Algorithm Based On Local Density And Framework Distance Between Clusters
9	Research And Implementation Of Web Document Clustering Algorithm Based On Semantic Gravitation And Density Distribution
10	Research On Clustering Algorithm Based On Density Peak And Its Application In Text Clustering