Based On The Selection Of The Initial Point Of K-means Clustering Algorithm And Its Application

Posted on:2016-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhou

Full Text:PDF

GTID:2428330491452627

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Data mining is to discover useful information or knowledge from massive data,it has been already widely applied in many fields.Cluster analysis is one of the most important analytical tools in data mining.There are many algorithms on cluster analysis,for instance K-means clustering is the most classical algorithm.K-means algorithm has the advantage of simple idear,good convergence and high efficiency in large scale data clustering.However,there are many shortcomings of this algorithm such as it is very sensitive to initial center,it requires the user to give the K value beforehand and so on.In this paper,in-depth study and analysis of the K-means clustering algorithm,summed up its strengths and weaknesses.As to the independence of the k-means to the initial centers selection,this paper presents two new improved algorithms and applies the second method on gene expression data.The researches and contributions are as follows:(1)This paper presents an improved K-means clustering algorithm based on initial point selection of the DNC values.This algorithm removes the outliers effectively and improves the instability of the clustering results with randomly selecting initial centers.This paper through a lot of experiments to prove the validity of this improved algorithm.(2)This paper presents an improved K-means algorithm based on initial point selection of weighted Euclidean distance.This algorithm improves the selection of initial cluster centers,which makes the selected centers express different clusterings,conques the problem of randomly selected centers getting too close.According to different characteristics contribute to the cluster size,weighted Euclidean distance is used to put the data object assigned to the corresponding cluster centers,improve the efficiency of clustering.A lot of experiments in the UCI data set have been done and the results have been compared and analyzed,this paper can prove that the improved algorithm is more efficient.Finally,this paper applies the improved algorithm for two gene expression data,achieve better results.

Keywords/Search Tags:

Data Mining, Clustering, K-means Algorithm, DNC, Weighted processing

PDF Full Text Request

Related items

1	Study On K-means Clustering Algorithm Based On Summarized Information Of RTVU Students
2	Improvement And Application Of K-means Algorithm
3	Scmi-superviscd K-means Clustering Algorithm In Data Mining
4	The Research Of Clustering Data Mining Based On Swarm Intelligence Algorithm
5	Semi Supervised Clustering Algorithm And Its Application And Research
6	K-means Research And Improvement Based On Particle Group Technology
7	Clustering Data Mining Applications In Department Store And K-means Clustering Algorithm Improvement
8	Research And Improvement Of K - Means Clustering Algorithm
9	The Improvement On The Fuzzy C-means Algorithm
10	Improvement And Empirical Study Of K-means Ciustering Algoirthm On Panel Data Analysis