Font Size: a A A

Research On Fast K-means Clustering Algorithm Based On The Partition Of Multi-granularity Granular Ball

Posted on:2021-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:D W PengFull Text:PDF
GTID:2428330614458418Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The k-means algorithm is a classic and widely used unsupervised algorithm in machine learning.The research on the efficiency improvement is very valuble when the kmeans algorithm deals with large-scale clustering scenarios.Thus,two types of improved solutions were proposed in this thesis to improve the efficiency.The first type is an accelerated exact k-means algorithm based on granular sphere partitioning,and the second one is an approximate k-means acceleration algorithm based on neighbor information.The first improved accelerated exact k-means algorithm proposed in this thesis is called Ball k-means,which uses the model of granular-ball to describe a cluster,focusing on reducing the the unnecessary point-center distance calculation,and then reducing the runtime to improve the efficiency.The the neighbor searching of Ball k-means can accurately find the neighboring granular-balls for each granular-ball,so that a sample point only needs to calculate the distances to the cluster centers of its neighboring granular-balls;In addition,each granular-ball can be divided into stable area and active area,and the latter one can be further divided into many annular areas.The data points in the stable area will not change,while the points in the annular area will be adjusted in some neighboring granular-balls in the current iteration round;Moreover,a method is designed to reduce the Euclidean distance calculation between the centers of the granular-balls in each iteration;As more and more granular-balls will be unchanged in the later iterations gradually,this thesis proposes a method to find these unchanged granular-balls,and proves that the granular-balls that tend to be unchanged do not need to be re-established,the data samples in the granular-balls do not need to participate in the assignment step in the current iteration round;Finally,we show that our algorithm outperforms the state-of-the-art accelerated exact k-means algorithms by theoretical analysis and experimental results on real data sets and artificially synthesized data sets,especially in deaing with large k problems.The simple design and no additional parameters of Ball k-means make it an all-around replacement for the Lloyd k-means algorithm.In some applications,the require of the quality of the clustering results is not strict,that is,it's acceptable that some clustering results are not accurate.But the clustering results need to be obtained quickly.Therefore,this thesis proposes the second type improved algorithm fast k-means clustering based on the neighbor information,which is an approximate k-means algorithm.Based on observations,during the k-means iteration rounds,the data samples in each granular-ball are only adjusted among the range of nearby granular-balls.Therefore,this thesis proposes a localization strategy which aims to reduce the range of the granular-balls in the assignment step;In addition,in order to maintain the clustering quality of the proposed algorithm,this thesis proposes a neighbor granular-balls update method,using the neighbors of the neighbors as candidate neighbors,and then performing the assignment step with the candidate neighbor granular-balls in subsequent iterations,thereby avoiding the quality loss of the final clustering result which results from maintaining the initial neighbor relationship.Finally,we evaluate the validity of this algorithm on some the real datasets.This proposal has achieved hundreds of speedups over Lloyd k-means algorithm in multiple data sets with only 1.10% of the clustering quality loss on average.This thesis proposes two fast and efficient k-means clustering methods,which provide two different solutions for fast clustering needs in large-scale clustering scenes.
Keywords/Search Tags:granular-ball, stable area, active area, localization strategy, neighbor cluster update
PDF Full Text Request
Related items