Font Size: a A A

Study Of Clustering Technology Based On Boundary Model

Posted on:2018-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:Q HanFull Text:PDF
GTID:2348330515975212Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering is a technique that classifies similar data points into the same cluster,and dissimilar data points into different clusters.In data analysis,the clustering techniques can be used to analyze the structure of data in the data set and the relationship among clusters,playing an important role in the field of pattern recognition,biological monitoring,drug development,information security monitoring and others.However,due to the sparsity of spatial data with high dimensions,there exist the problems of difficulty of clustering and low precision of clustering when the existing clustering techniques are applied to cluster the data with high dimensions.Different from the traditional clustering idea,this thesis has proposed some new clustering algorithms based on the idea of first searching for the clustering boundary points and then seeking for the clustering core points.The innovations are as follows:This thesis has proposed a new clustering algorithm CASB(A Clustering With Affine Space Algorithm Based Boundary Detection)for data with high dimensions.The algorithm firstly establishes the cluster boundary model with the invariance of the topological structure after the transformation of affine space,and then find the cluster boundary;and based on the boundary points,the connection matrix can be constructed,finally,the cluster can be formed by searching from the cluster boundary to the internal point.The experimental results have showed that the algorit hm can be used to cluster high dimensional data with various densities,sizes and shapes.Compared with other similar algorithms,the algorithm proposed in this thesis has a higher accuracy and it is easier to select the parameters.This thesis also has proposed a new clustering algorithm of C-USB(A Clustering Algorithm Using Skewness-based Boundary Detection).The algorithm first proposes a hypothesis of skewness,namely there is the data skew of the cluster boundary points and the nearby points in the spatial distribution;then the boundary degrees of the data points are calculated according to the skewness of the data points;finally the connection matrix can be formed by deleting the neighbor relationship of the data points based on the boundary points.The experimental results have showed that the proposed algorithm can be used for clustering analysis of complex high-dimensional data sets,maintaining a high accuracy at the same time and perfect clustering effect can be got when this algorithm is applied to large-scale data sets.This thesis also has proposed a new clustering algorithm of CUSBD(Clustering Based On Skew-based Boundary Detection)for complex data.Similarly,the algorithm proposes a hypothesis of the distribution of boundary points,namely there exists the data skew of the cluster boundary points and the nearby points in the spatial distribution(using the gamma distribution);then the degrees of skewness of the data points and the nearby points are calculated based on the hypothesis and these are regarded as the boundary degrees of the points to search for the boundary of the cluster;finally,the connection matrix can be established to for the cluster based on the boundary points.The experimental results have showed that the proposed algorithm can effectively control the clustering accuracy of data sets with various densities,sizes,shapes and scales,which is convenient for calculation.
Keywords/Search Tags:clustering boundary, clustering algorithm, boundary degree, affine space, skew hypothesis, boundary model
PDF Full Text Request
Related items