Font Size: a A A

Study On Local Cores-Based Clustering Algorithm And Measurement

Posted on:2019-03-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:D D ChengFull Text:PDF
GTID:1368330596958584Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data mining aims to find novel,potential and useful knowledge.Clustering analysis is the main task of Data mining.It divides objects into several clusters,so that the similar objects are in the same cluster while the dissimilar objects are in different clusters.It has been widely used in pattern recognition,image process,artificial intelligence,medicine,genetic science,geology and management.Recently,with the development of information technology,the data scale is getting larger and the data structure is becoming more and more complex,which brings new challenges to the research of cluster analysis.In this paper,we analyze the basic theory and algorithms of clustering,and study the problem of clustering complex manifold datasets.The main work and achievements include the following aspects:(1)Natural neighbor-based local cores are proposed.When clustering big data,the classic algorithms require a lot of time.In order to solve the problem,we select representatives from data sets and assign the rest objects to the clusters their representatives belong to.Natural neighbor-based local cores use Natural neighbor-searching algorithm to obtain the local neighbor of each object.Local neighbor means that objects in dense areas have more neighbors,while objects in sparse areas have fewer neighbors.Each local core is the object with the greatest density in its local neighbor.Each remaining object is assigned to the cluster its representative belongs to and the data set is divided into several clusters,which provides convenience for clustering complex manifold data sets.Applying the local cores to DP algorithm,hierarchical clustering algorithm and minimum spanning tree-based clustering algorithm,reduces the time complexity of the algorithms and demonstrates the effectiveness of using the local core point to replace the original data set.(2)A novel local cores-based DP algorithm,DPLORE,is proposed.The researchers proposed using geodesic distance to measure the dissimilarity between objects in a manifold.Due to the lack of prior knowledge,the accurate geodesic distance cannot be obtained,and the shortest path is a good approximation of the geodesic distance.However,calculating the shortest path between all objects has a high time complexity.Therefore,we use of local cores instead of the whole data set for calculation.DPLORE algorithm first finds the local cores,then introduces the adaptive distance to measure the distance between the local cores,and finally uses DP algorithm to cluster local cores.Due to the introduction of natural neighbors and adaptive distances,the algorithm does not need to set parameters and can find complex manifold clusters.Experiments show that compared with the existing algorithms,DPLORE has advantages in finding complex manifold clusters.(3)A local cores-based hierarchical clustering algorithm,HCLORE,is proposed.When recognizing patterns from complex structures,humans tend to discover obvious clusters in dense regions firstly and then deal with objects on the border,which will eliminate the interference of noise points.Inspired by this idea,we proposed a hybrid hierarchical clustering algorithm HCLORE.HCLORE algorithm combines "top-down" and "bottom-up" strategies.It partitions the data set into several clusters by finding of local cores,instead of optimizing an objective function through iteration.Then,we determine the density threshold according to the density increasing curve,which eliminates the influence of the low-density data object and makes the boundary between the cluster and the cluster clearer.Then,we redefine the similarity between clusters to merge the sub-clusters and make the algorithm applicable to complex manifold datasets.Experiments on synthetic data sets and real data sets show that the HCLORE algorithm has advantages over other algorithms for clustering complex manifold data sets.(4)A local cores-based MST clustering algorithm,MSTLORE,is proposed.The existing MST-based clustering algorithms construct the minimum spanning tree on the original data set,which not only has high time complexity but also is easily affected by noise points.The local cores exclude the noise points while retaining the distribution of the original data set.Therefore,we combine the local cores with MST-based clustering algorithm and propose MSTLORE algorithm.We define shared neighbors-based distance to measure the dissimilarity between local cores.MSTLORE algorithm constructs a minimum spanning tree on the local core point instead of the original data set based on shared neighbors-based distance,which reduces the running time and eliminates the noise points.Redefining the distance between local core points allows the algorithm to discover clusters with complex structures.Experiments on synthetic datasets and real datasets show that the MST LORE algorithm is more competitive than other algorithms.(5)A local cores based cluster measurement,LCCV,is proposed.To solve the problem that the existing internal measures cannot evaluate complex manifolds,we use the shortest path to measure the dissimilarity between local cores,and then evaluate the compactness and separation of each local core to evaluate its cluster quality,and finally average cluster quality of each local core point.Due to employ the shortest path to measure the dissimilarity between local cores,LCCV can effectively evaluate complex manifold clusters.We combine LCCV index with HCLORE algorithm to verify the validity of LCCV.Experiments show that compared with other clustering metrics,LCCV has more advantages in measuring complex manifold clusters.
Keywords/Search Tags:local cores, complex manifold, clustering analysis, cluster measurement
PDF Full Text Request
Related items