Improved K-means Algorithm Based On Quadtree Space Partition And Its Validity Verification

Posted on:2022-07-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y D Zhang

Full Text:PDF

GTID:2518306542463144

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As a classic clustering algorithm,K-means is widely used in various data analysis fields due to its features such as simplicity,effectiveness and scalability.However,because the K-means algorithm relies too much on the selection of the initial center of the cluster,the attribute value of each sample point is treated equally for each data set,and it is susceptible to outliers,which leads to the low accuracy and unstable operation results of the clustering algorithm.In order to effectively solve the above problems,this dissertation proposes an improved K-means algorithm based on quadtree space division and density clustering.The two-dimensional simulation data sets and the high-dimensional real data sets are tested in our experiments,and QD-K-means algorithm is compared with several existing clustering algorithms.Experimental results show that the accuracy of the QD-K-means algorithm is higher than other clustering algorithms.Through comparing the clustering validity index with the existing indices,the QDVI index proposed in this dissertation can more accurately obtain the optimal number of clusters.The main work of this dissertation is as follows:(1)Based on the idea of density clustering algorithm,this dissertation aims at the problem of unstable efficiency of the traditional K-means algorithm due to the random selection of the initial value,considering that the density parameter can be dynamically calculated for different data sets,an improved algorithm QD-K-means is proposed;At the same time,QD-K-means algorithm solves the problem of single processing data set in K-means algorithm with the idea of quadtree space division,and targeted processing is performed on the scale characteristics of each data set,so as to obtain high-quality clustering results.It can be concluded that the improved QD-K-means algorithm proposed in this dissertation is more stable.(2)This dissertation proposes a new clustering validity index named QDVI index,which can deal with fuzzy data sets well.Combining the idea of quadtree space division,this dissertation uses the grid position of the cluster center point to replace the point position in the traditional index.Through calculating the shortest path between the grids,makes the calculated distance between the clusters more accurate,and avoids the problem that all the points participated in the calculation causes the long calculation time of algorithm.At the same time,using the positional relationship between grids can improve the accuracy of evaluating clustering results and calculate the differences between clusters more accurately.In addition,an inhibitory factor is introduced to prevent the index from gradually decreasing with the number increase of clusters,resulting in the loss of the meaning of the division,ensuring that the QDVI index has a more stable evaluation effect.(3)An improved algorithm QD-K-means and a new clustering validity index QDVI based on the method of quadtree space division and density clustering proposed in this dissertation.Through comparison experiments on simulated data sets and real data sets,the results show that the QD-K-means clustering algorithm runs faster than the traditional K-means algorithm,K-means++,K-medoids and improved K-means,and the accuracy rate is higher.At the same time,the comparison experiments between the QDVI index and the existing indices show that the proposed QDVI index has better stability and robustness.

Keywords/Search Tags:

Density clustering algorithm, Quadtree space division, Clustering validity index, K-means algorithm

PDF Full Text Request

Related items

1	Improving Of Clustering Algorithm And Research On Clustering Validity Index
2	Research On New Clustering Validity Index Based On Improved Clustering Algorithm
3	A Class Of Density-based Clustering Algorithms
4	A Clustering Validity Index Based On Noise Suppr Ession And Its Application
5	Research Of Improved K-means Algorithm And New Cluster Validity Index In Cluster Analysis
6	Optimization Method Of Multi-distribution Centers Location Based On K-means Clustering Algorithm And Evidential Reasoning Approach
7	Research On Cluster Center Optimization Of K-means Algorithm
8	Research On Clustering Validity Evaluation Method Of Fuzzy C-Means Algorithm Based On Components
9	Optimal Density Clustering And Validity Analysis Of Double Statistics
10	A Kind Of Efficient Clustering Validity Index And Its Application