Font Size: a A A

Based On The Selection Of The Initial Point Of K-means Clustering Algorithm And Its Application

Posted on:2016-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:2428330491452627Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining is to discover useful information or knowledge from massive data,it has been already widely applied in many fields.Cluster analysis is one of the most important analytical tools in data mining.There are many algorithms on cluster analysis,for instance K-means clustering is the most classical algorithm.K-means algorithm has the advantage of simple idear,good convergence and high efficiency in large scale data clustering.However,there are many shortcomings of this algorithm such as it is very sensitive to initial center,it requires the user to give the K value beforehand and so on.In this paper,in-depth study and analysis of the K-means clustering algorithm,summed up its strengths and weaknesses.As to the independence of the k-means to the initial centers selection,this paper presents two new improved algorithms and applies the second method on gene expression data.The researches and contributions are as follows:(1)This paper presents an improved K-means clustering algorithm based on initial point selection of the DNC values.This algorithm removes the outliers effectively and improves the instability of the clustering results with randomly selecting initial centers.This paper through a lot of experiments to prove the validity of this improved algorithm.(2)This paper presents an improved K-means algorithm based on initial point selection of weighted Euclidean distance.This algorithm improves the selection of initial cluster centers,which makes the selected centers express different clusterings,conques the problem of randomly selected centers getting too close.According to different characteristics contribute to the cluster size,weighted Euclidean distance is used to put the data object assigned to the corresponding cluster centers,improve the efficiency of clustering.A lot of experiments in the UCI data set have been done and the results have been compared and analyzed,this paper can prove that the improved algorithm is more efficient.Finally,this paper applies the improved algorithm for two gene expression data,achieve better results.
Keywords/Search Tags:Data Mining, Clustering, K-means Algorithm, DNC, Weighted processing
PDF Full Text Request
Related items