In data mining,how to extract our useful information is the research focus of scholars,and cluster analysis is one of the most important analysis methods.It has important research significance for data visualization.Due to the complexity of data Sexuality and diversity,clustering of mixed attribute data has become one of the hot issues in cluster analysis.In the clustering research of mixed attribute data,many existing clustering algorithms can get better clustering results,but they rely heavily on the initial value and the number of clusters.They need artificial selection parameters,which may cause aggregation.The class gets a bad result;And for the calculation of the distance between the mixed attribute data objects,the data is generally regarded as two parts,numerical type and subtype,and then the data of the same attribute is calculated,and the two are added and solved,which may Lead to the loss of some information;For data with complex shapes,some algorithms will get poor clustering results.For these problems,the following research is done.(1)For the problem that the K-means algorithm relies on the initial value and the number of clusters,the ACC algorithm is used to determine the initial value and the number of clusters to adjust the K-means algorithm.The experimental verification is performed on the UCI dataset.The ACC-K-means algorithm has higher accuracy and better stability.(2)For the problem that mixed data is a whole data,this paper uses Gower coefficient to process mixed attribute data.The K-prototype algorithm relies on the initial value and the number of clusters.This paper uses ACC algorithm and then based on the idea of limited coverage.The data is globally optimized to achieve better clustering results.Experiments show that the improved algorithm CBDO algorithm has better experimental results than K-means algorithm and K-prototype.(3)For the problem of dealing with complex shape data,this paper uses spectral clustering algorithm for clustering.Since the distance in the similarity matrix in spectral clustering is based on Euclidean distance,the information between data will be lost,so we adopt The manifold distance based on information entropy weighting.Experimental results show that the proposed algorithm has better clustering performance. |