Font Size: a A A

Research On Improved K-means Clustering Algorithm And Its Application

Posted on:2015-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2298330467986764Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
Data mining is to enable people to fully understand and apply effectively hidden in the data information and knowledge and developed a new technology. Clustering analysis as an important unsupervised mode in data mining,can be roughly divided into the following categories:partition-based clustering method, based on hierarchical clustering method, based on net grid clustering methods, density-based clustering methods.k-means algorithm is a clustering algorithm based on partition, because the easy and efficient, k-means are widely used in the field of remote sensing, but with the development of remote sensing technology, the amount of data contained in remote sensing image become very large, and the k-means clustering algorithm constrained by the initial center point, therefore, based on the k-means clustering algorithm analysis, for the problems of k-means clustering algorithm, its corresponding improvement, mainly in the following two aspects:(1)When k-means clustering algorithm dealing with large-scale data, it will consume more memory resources and computing costs, as well as the problem can not be effective solved, using the MapReduce programming model for parallel k-means clustering method is proposed. Firstly, the large-scale data divided into blocks, and then assigned to each sub-node cluster, after the sample data clustering, synchronous updating cluster centers and then clustering until the cluster center does not change. By the4sets of data simulation results from UCI data, verify the effectiveness of the parallel k-means algorithm, and verify that the method is effective for remote sensing image processing, using large remote sensing data testing the parallel cluster’s speedup and scalability.(2)k-means clustering algorithm is affected by the initial cluster centers and data noise, prone to instability clustering results. Focusing on this problem, firstly, based on differential evolution algorithm, a new method for better initial cluster centers is proposed. And then, according to the different influence of clustering which produced by sample data, a weighted Euclidean distance is proposed to reduce the adverse effects which produced by noise data and other uncertainties, and get a stable clustering results. By the4sets of data from UCI data verification, the proposed algorithm has more stable clustering result, at the same time, the application of this method to the actual remote sensing images show that the proposed algorithm can better processing remote sensing image data, the object type is divided rationally.
Keywords/Search Tags:k-means, Clustering, MapReduce, Differential Evolution
PDF Full Text Request
Related items