In recent years,the role of clustering in machine learning has received increasing attention in many disciplines.Clustering is a method of classifying samples from different classes into different classes without any a priori information,which allows information from different classes to be divided into different class groups as needed.Three-way clustering is a new clustering algorithm that divides the clustering results into three regions.This division makes it possible to clearly distinguish the central region,the peripheral region,and the trivial region of class groups,which is very effective when the boundaries of class groups are unclear.To solve the problems of the current research,this paper proposes a three-way K-means algorithm(TKSS)based on sample stability and similarity,and addresses the problem of the sensitivity of the threeway K-means algorithm to the selection of the initial cluster center by combining sample stability and Euclidean distance for the optimal selection of the initial cluster center,and using sample similarity for the automatic selection of the number of clusters.The main research of this paper is as follows:(1)This paper reviews the relevant knowledge of three-way clustering algorithms that optimize the initial cluster center and automatically select the number of clusters,and elaborates on the relevant contents and methods involved in this paper.It includes sample stability,sample similarity,three-way decision,three-way clustering,selection of initial clustering center,and clustering algorithm evaluation indicators such as NMI,ARI,AS,and DBI.(2)In order to address the problems associated with the fact that random selection of initial cluster centers in the current three-way K-means clustering algorithm means that the clustering results are unstable and the need to artificially fix the number of clusters,this paper proposes a three-way K-means(TKSS)algorithm,which optimizes the selection of initial cluster centers and automatically obtains the optimal number of clusters based on the stability and similarity of the samples.The algorithm uses the stability of the samples to partition the dataset and selects the initial cluster centers based on the Euclidean distance between the samples in a very stable dataset;the optimal number of clusters is selected by iterating the number of clusters under the direction of the sample similarity index.In this paper,the TKSS algorithm was tested on eight datasets,including two simulated datasets and six real UCI datasets,using NMI,ARI,AS and DBI as evaluation metrics,and compared with current three-way clustering algorithms and conventional clustering algorithms.The experimental results show that TKSS can cluster data efficiently with a good clustering effect.In this paper,the three-way K-means algorithm is improved by combining sample stability and similarity to avoid the random selection of initial clustering centers and to perform automatic selection of the number of clusters.The algorithm in this paper improves the performance of clustering results by improving the three-way K-means,which will contribute to the development of clustering algorithms. |