Font Size: a A A

Research On Clustering Algorithm Based On Improved Particle Swarm Optimization

Posted on:2014-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:X H XieFull Text:PDF
GTID:2298330431489530Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of advanced database systems and Internet, a variety of complex forms of data grow explosively, which calls for an effective tool to deal with it efficiently. Clustering is an important research area of Data Mining, which is an unsupervised learning process of dividing dataset into different classes according to similarity measures. K-means is a typical clustering algorithm based on partition which is simple and easy to implement, converges fast and has a strong local searching capability. It has been widely used in many fields such as remote sensing image processing, pattern recognition and so on. Therefore, it is meaningful to study k-means algorithm.However, k-means algorithm has great dependency on initial cluster centers. Different initial centers may lead to different results, which causes great fluctuations and lowers the clustering quality. Moreover traditional k-means algorithm is easy to fall into local minima, failing to achieve global optimal solution. In order to solve the above problems, an optimized clustering algorithm of k-means based on improved particle swarm optimization (PSO) is proposed, which utilizes the strong searching capabilities of PSO. First PSO algorithm is adopted to search for k global optimal clustering centers which serve as k-means algorithm’s initial centers. Then k-means algorithm is utilized to search for the final cluster results. The improvement and optimization of the algorithm is as follows:(1) determining the occasion of the improved algorithm transferred from PSO algorithm to k-means algorithm by the fitness variance of the particle swarm;(2) updating the inertia weight dynamically and adding a flight time factor so as to strengthen the global searching capability of the PSO algorithm;(3) using variants to monitor the optimal value condition of every particle and the particle swarm in real time, in order to determine whether the premature convergence phenomenon of the particle swarm appears;(4) executing mutation operations on those particles that converge prematurely and stuck in local minima, which makes the particles get rid of the local minima in time by increasing the diversity of the particle swarm. Furthermore, a parallel strategy for the improved clustering algorithm based on MapReduce is proposed, solving the problems that the serial improved k-means algorithm’s execution is inefficient and it can’t meet the performance requirements when dealing with massive datasets.Experimental results show that the proposed algorithm has higher clustering accuracy rates, reduces the volatility of the cluster results and improves the clustering quality of k-means algorithm to some extent. Meanwhile the proposed algorithm keeps the particle swarm from prematurity and makes it possible for the PSO algorithm to converge faster during the latter part of the algorithm. It is also shown that the parallization of the above improved clustering algorithms based on MapReduce is feasible and effective, and the parallel strategy has good scalability and efficiency.
Keywords/Search Tags:clustering, k-means, particle swarm optimization, globaloptimum, MapReduce
PDF Full Text Request
Related items