Font Size: a A A

Research On Integration Of K-means Algorithm And Intelligent Algorithm

Posted on:2015-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y GuanFull Text:PDF
GTID:2268330428465518Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The basic meaning of data mining is to get directly or potentially valuable information or knowledge to the user from a mass of incomplete and noisy data. Notable features of cluster analysis is not need any prior knowledge or information, but according to the attribute of objects, the objects into clusters, and at the same time as far as possible to meet the high cohesion, low coupling with an intra cluster and inter cluster requirements. So that the clustering is an unsupervised learning method. The rapid development of cluster analysis techniques make all aspects, it is widely used in scientific research and the life, is one of the important branch of data mining field.K-means algorithm is a typical clustering algorithm and received a lot of attention because of it’s simple and easy to realize. But there are some disadvantages such as the selection of the initial center point is more sensitive to the same set of data, the different initialization may have different results, especially in the face of irregular or large data sets is even more so.Genetic algorithm is an intelligent algorithm imitating the biological mechanism of natural selection and evolution, the chromosome as a basic operation unit, through between the chromosome crossover and mutation, duplication, and the final choice of optimal individual by the evaluation function which is setted in advance.Genetic algorithm has potential parallelism, robustness, problem independent, global optimality etc. Thus it is obtained extensive research and application.Cloud computing is an inevitable product of big data era. Map-Reduce computing model is a framework of Hadoop platform, it is a cloud to imitate Google cloud computing platform architecture of free and open source computing platform.In recent years, the research on the cloud computing is mostly carried out under this platform.This paper presents the genetic K-means parallel algorithm design is implemented on Hadoop platform in order to avoid which is based on MPI parallelization cumbersome design and improve the convergence efficiency and accuracy of clustering.Ant colony algorithm is the focus of study in recent years, the main principle is the use of pheromones ant activity process, and then ant according to different concentration of pheromone to find the best path or method.In addition to the regular application of ant algorithm in the familiar TSP problem and so on, clustering algorithm which is based on ant foraging and corpses piled up in more and more attention. Clustering algorithm based on ant foraging using swarm intelligence of ants can often get better clustering results but in the clustering of early due to the lack of pheromone lead to ant algorithm has a slow convergence speed.Aiming at the shortcoming, this paper proposed to preprocess the data set, using methods based on density and distance to select the initial cluster center, and then generate the initial pheromone distribution is not uniform to speed up the convergence time of the algorithm.
Keywords/Search Tags:genetic algorithm, ant colony algorithm, Hadoop, MapReduce, parallel, k-means algorithm
PDF Full Text Request
Related items