As a classic clustering algorithm,K-means is widely used in various data analysis fields due to its features such as simplicity,effectiveness and scalability.However,because the K-means algorithm relies too much on the selection of the initial center of the cluster,the attribute value of each sample point is treated equally for each data set,and it is susceptible to outliers,which leads to the low accuracy and unstable operation results of the clustering algorithm.In order to effectively solve the above problems,this dissertation proposes an improved K-means algorithm based on quadtree space division and density clustering.The two-dimensional simulation data sets and the high-dimensional real data sets are tested in our experiments,and QD-K-means algorithm is compared with several existing clustering algorithms.Experimental results show that the accuracy of the QD-K-means algorithm is higher than other clustering algorithms.Through comparing the clustering validity index with the existing indices,the QDVI index proposed in this dissertation can more accurately obtain the optimal number of clusters.The main work of this dissertation is as follows:(1)Based on the idea of density clustering algorithm,this dissertation aims at the problem of unstable efficiency of the traditional K-means algorithm due to the random selection of the initial value,considering that the density parameter can be dynamically calculated for different data sets,an improved algorithm QD-K-means is proposed;At the same time,QD-K-means algorithm solves the problem of single processing data set in K-means algorithm with the idea of quadtree space division,and targeted processing is performed on the scale characteristics of each data set,so as to obtain high-quality clustering results.It can be concluded that the improved QD-K-means algorithm proposed in this dissertation is more stable.(2)This dissertation proposes a new clustering validity index named QDVI index,which can deal with fuzzy data sets well.Combining the idea of quadtree space division,this dissertation uses the grid position of the cluster center point to replace the point position in the traditional index.Through calculating the shortest path between the grids,makes the calculated distance between the clusters more accurate,and avoids the problem that all the points participated in the calculation causes the long calculation time of algorithm.At the same time,using the positional relationship between grids can improve the accuracy of evaluating clustering results and calculate the differences between clusters more accurately.In addition,an inhibitory factor is introduced to prevent the index from gradually decreasing with the number increase of clusters,resulting in the loss of the meaning of the division,ensuring that the QDVI index has a more stable evaluation effect.(3)An improved algorithm QD-K-means and a new clustering validity index QDVI based on the method of quadtree space division and density clustering proposed in this dissertation.Through comparison experiments on simulated data sets and real data sets,the results show that the QD-K-means clustering algorithm runs faster than the traditional K-means algorithm,K-means++,K-medoids and improved K-means,and the accuracy rate is higher.At the same time,the comparison experiments between the QDVI index and the existing indices show that the proposed QDVI index has better stability and robustness. |