Font Size: a A A

Study Of Boundary Detecting Algorithm For Each Cluster

Posted on:2017-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2308330485487797Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering aims at dividing the data set into many different clusters according to the similarity between data objects. Boundary detection focuses on finding the boundary objects which are located in the edge of the clusters. Compared with the other objects in a cluster, boundary objects have their unique features. They all belong to a certain cluster, but they are different from the objects which are located inside the cluster. Now days, methods, like BOUND, BERGE, have been proposed in the field of boundary detection. Although those algorithms can obtain the whole boundary of the data set effectively, they cannot get the cluster numbers or the boundary of each cluster in the data set. In order to solve those problems, two different boundary detecting algorithms for each cluster in a data set is proposed.Boundary Detecting Algorithm for Each Cluster based on DBSCAN: DBORDER. Firstly, according to the core point percent and the density value of each data object, all the core points are extracted by this algorithm from the data set. Then, many connected undirected graphs will be constituted by these core points. And the cluster numbers of the data set can be known by those connected undirected graphs for each one of them represents a cluster. Finally, eps field will be diveded into two fields: the positive field and the negative field. And the boundary of each cluster or the whole data set can be detected by the distribution characteristics of the data objects which are located in the positive field and negative field of the given data object. The experimental results on many data sets with noise show that DBORDER algorithm can obtain the cluster numbers and the boundaries of the clusters with different size or shapes effectively.Boundary Detecting Algorithm for Each Cluster with the Function of Clustering based on K nearest neighbors: KBORDER. Firstly, the K nearest neighbors and the reverse K nearest neighbors for each data object in the data set will be calculated by the value of K. And the boundary degree of every data object will be calculated according to its RKNN value. Then, a concept named Reached Neighbors(RN) is proposed according to the neighbors’ relationship between the data points. And an edge will be put between the points which are satisfied the concept of RN. Many connected undirected graphs will be constituted in this way. And the cluster numbers and the clustering division of the data set can be known by those connected undirected graphs for each one of them represents a cluster. Finally, The boundary of the whole data set or each cluster can be detected by boundary degree combined with the boundary percent and the clustering division. The experimental results on many data sets with noise show that KBORDER algorithm can obtain the clustering division and the boundary of the whole data set or each cluster with different size or shapes effectively.
Keywords/Search Tags:Clustering boundary, Cluster numbers, Border degree, Point density, K nearest neighbors, Reverse K nearest neighbors
PDF Full Text Request
Related items