| Clustering analysis is one of the important methods in data mining in unsupervised learning.Clustering algorithms can minimize the gap between the same clusters to gather the similar clusters and maximize the distance of different clusters to cut clusters that are far away from each other by exploring the similarity characteristics of data distribution.Therefore,it is a challenging research direction to more finely screen and identify the boundary to distinguish the subject and boundary of the data,and proactively judge the real cluster of multiple data distribution types by mining the overall and local information of the data.It involves the border identification of clusters,the determination of cluster number,the structure judgment of data distribution and time complexity,etc.In this paper,we take the problem of border identification as the breakthrough point,and carry out in-depth research from the perspectives of density and grid,and propose three innovative clustering algorithms based on triangle neighbor,character extracting and local coefficient of variation.Mining and analyzing potential structural properties between data to peel border points not only effectively realize segmentation and aggregation of data,but also reduce the interference of human factors based on these algorithms.In addition,the proposed algorithms can be applied to faces recognition,precipitation distribution and image segmentation.The specific results of this paper are as follows:(1)Clustering algorithm based on triangle neighbor connectionWe propose a novel clustering algorithm based on triangle neighbor connection(TNCC).The motivations of the TNCC are as follows: 1)the representative core data points are obtained by using the nearest neighbor parameter to define the size of search step to traverse data and reduce unnecessary the cost of time calculation,2)the effective segmentation of clustering is realized by solving the defect of excessive dependence on external factors and identifying the low density boundaries of the areas close to each other,3)the accuracy of the algorithm is improved by identifying and judging the type of border to obtain the effective clustering of complex data.The algorithm constantly searches for core data points with varying step size and expands the neighborhood range of core points,and then cluster clipping points are formed according to the triangular connection between distance and neighbor attributes,which is used to segment and aggregate sub-clusters.the feasibility and effect of TNCC are verified by clustering results on 14 synthetic and 9 real-world datasets.The generalization ability,stability and otherness of the algorithm are also confirmed by Cross-validation and Friedman statistical test experiments on multiple synthetic datasets.The application on faces recognition further proves the practicability of TNCC.(2)A rolling iterative clustering model via extracting data featuresWe propose a rolling iteration clustering model ROCM based on circular partition data.The motivations of the proposed ROCM are as follows: 1)the data should be by changing circular structure,2)the difficulty in choosing the truncation distance and the number of clusters needs be overcome,3)the ability to deal with low-density regions and border data also should be improved.ROCM divides the data into different radius circular structures based on the local characteristics of the data distribution and maximizes the aggregation of relevant circular data structure.According to the radius and number of data points in the circles,the generalized local density of representative point of each circle is calculated,and the cluster centers are determined dynamically by density peak clustering,which avoids the artificial selection of truncation distance.The correlation principle between representative data points ensures that the cluster centers expand outwardly and adaptively obtain sub-clusters and border points,which realizes the segmentation between clusters.The associated scale guarantees the effective aggregation between sub-clusters.Experiments on datasets demonstrate the rationality and effectiveness of ROCM,which can effectively identify the boundary of the nearer clusters and prevent the dependence of human factors on parameter regulation.(3)Cluster center selection algorithm with coefficient of variationAn algorithm named VCCS based on local coefficient of variation and sequential distance is proposed.The motivations of the VCCS are as follows: 1)the relationship between the dense data is analyzed by dividing the boundary data with poor similarity,2)the cluster center and number should not be manually adjusted,the parameter settings and time complexity should be reduced.The VCCS algorithm defines a logarithm penalty density to evaluate the similarity between data points and then reveals the degree of dispersion of points.The boundary points are identified from the perspective of discrete density and sequence distance,which can effectively reduce the association between clusters and avoid the chain reaction.In addition,VCCS adaptively selects multiple cluster centers and expands their neighborhood to automatically obtain final cluster centers,which effectively deals with the classification of regions with uneven distribution.The experiments on 15 datasets and image segmentation verify the feasibility of the proposed algorithm.This algorithm can effectively identify the boundary of the nearer cluster and has a higher recognition on the cluster center of the low-density region of non-convex and non-uniform datasets.The experimental results of parameter sensitivity show that the performance of the proposed algorithm has no a significant change in a larger parameter variation range,which reflects the stability of the algorithm. |