Study On Improvement Of K-means Clustering Algorithm

Posted on:2019-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:J Guo

Full Text:PDF

GTID:2428330563456424

Subject:Public Security Technology

Abstract/Summary:

PDF Full Text Request

With the development of information technology,the ability of humans to collect,store,transmit,and process data has rapidly increased.Every corner of human society,such as commerce,society,science,engineering,medicine,and daily life,has accumulated a large amount of data,and it requires a powerful and universal tool to effectively analyze and utilize data.Machine learning and data mining are exactly in line with this urgent need in the data era.They have achieved great development and received extensive attention.Cluster analysis is widely used in the fields of machine learning,data mining,pattern recognition and image processing.There are many kinds of clustering algorithms.K-means clustering algorithm has become one of the most widely used clustering algorithms because of its simplicity,high efficiency,and adaptability.However,the traditional K-means clustering algorithm has two relatively outstanding problems: one is the selection of the initial clustering centers,and the other is the determination of the number of clusters.For the initial clustering centers selection problem,a new K-means initial clustering centers optimization algorithm based on Principal Components Analysis(PCA)was proposed by referring to Rahman's Sum Score algorithm.The method first uses principal component analysis to reduce the original multidimensional data into one-dimensional data.Second,sort one-dimensional data in ascending order.Third,the sorted data is divided into k subsets.Then,the multidimensional data is divided into k subsets by the correspondence between one-dimensional data and multi-dimensional data.And find the center of the k subsets.Finally,the k nearest data points from the center of k subsets in the original multidimensional data are taken as the initial cluster centers.Compared with other optimization algorithms on artificial datasets and UCI datasets,the experimental results show that the new algorithm can significantly improve the quality of clustering.For the problem of how to determine the number of clusters,a new method for determining the optimal clustering number of K-means based on potential stability is proposed.This method makes use of the potential stability of the dataset,selects two different initial cluster centers for clustering,and then compares whether the two clustering results are the same to obtain the optimal clustering number.And compared with themethod of determining the optimal clustering number based on the cluster validity internal evaluation index on the UCI datasets,the results show that the new method can obtain the correct number of clusters more accurately.

Keywords/Search Tags:

K-means clustering algorithm, principal component analysis, initial clustering centers, potential stability, number of clusters

PDF Full Text Request

Related items

1	Improvements And Implementation Of K-means Clustering Algorithm
2	Research On Improvement Of K-means Clustering Algorithm
3	Algorithms Implementation Of Determining The Number Of Clusters And Initial Cluster Centers For Mixed Data
4	The Selection And Improvement Of K-means’s Initial Clustering Centers
5	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
6	Research Andapplication On Determining Optimal Number Of Clusters In Cluster Analysis
7	Research On Multi-view K-means Clustering
8	The Research And Application Of Text Clustering Based On Improved K-means Algorithm
9	Some Problems Of Determining The Optimal Number Of Clusters In Clustering Analysis
10	Research On Determining Optimal Number Of Clusters In Cluster Analysis