Font Size: a A A

Research And Distributed Implementation Of Cluster Algorithm Combined AFSA With K-means

Posted on:2017-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:S H ChenFull Text:PDF
GTID:2308330509452526Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
K-means algorithm is a typical, distance-based clustering algorithm. Simpleness, nearly linear time complexity makes it more suitable for mining large data sets. Sum of the squared errors is criterion function of k-means algorithm. Function value decreases in each iteration.Therefore, k-means clustering can be classified as an optimization problem. Swarm intelligence optimization algorithm is a new evolutionary computation techniques. Using the advantages of population and distributed search, it can quickly get a best global solution without knowing model situation. As a representative of swarm intelligence optimization algorithm, AFSA is based on biomimetic behavior. In order to achieve optimization, artificial fish is constructed to mimic foraging, aggregation and rear-end behavior of fish. AFSA is converges fast and does not require strict model.Computational complexity of AFSA becomes larger with the increase of the number of artificial fish. Besides, max step is changeless, which affects the convergence speed and accuracy in the late process of optimization. Therefore, elimination mechanism and adaptive max step strategy is proposed in this paper. Elimination mechanism is based on the fitness function. After a certain amount of iteration, artificial fishes which has small fitness is eliminated so as to reduce the number of artificial fish and the amount of calculation. In the early process of optimization,adaptive max step strategy is to get a large step to speed up the convergence. In the late process of optimization, adaptive max step strategy is to get a small step to improve the accuracy of optimization.In this paper, AFSA is combined with k-means for large-scale data mining. The aim is to use the global optimality of AFSA to solve the problem that k-means algorithm is sensitive to the initial centers and easy to fall into local optimization. The main work includes:a kind of encoding,which contains the number of clusters and cluster centers, designed to increase the probability of finding the global optimum cluster centers. We use an artificial fish to be on behalf of an initial cluster centers. Initial clustering center of k-means is introduced to fitness function of AFSA so that the artificial fish can automatically determine the approximate optimal initial cluster centers in the process of optimization. Then the approximate optimal initial cluster centers is regarded as the initial value of k-means for searching locally and detailly to improve accuracy.Faced with large-scale data, the processing capabilities of traditional algorithm is not satisfactory, so how to mine efficiently has become a research focus. In this paper MapReduce parallel computing framework is used to deal with AFSA and k-means algorithm. Under the premise of ensuring effect of each algorithm, distributed implementation of two algorithms is aim to improve the expansibility and efficiency.
Keywords/Search Tags:clustering analysis, k-means algorithm, AFSA, mining large-scale data
PDF Full Text Request
Related items