Font Size: a A A

Artificial Bee Colony Clustering Algorithm Based On Dynamic Neighborhood Disturbance Learning

Posted on:2020-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:R MuFull Text:PDF
GTID:2428330596479289Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,A large amount of data is constantly generated in every industry and every moment.Discover these rules from the data and help people make more informed decisions.Greatly promote the development of many fields.Therefore,research on various data mining technologies has become a hot research direction.Among them,cluster analysis technology has become the focus of current research because it can discover the differences and connections within things.The main research contents of this paper are as follows:(1)Firstly,This paper introduces the research background and current situation of cluster analysis,swarm intelligence optimization algorithm and parallel computing technology at home and abroad.At the same time,the theoretical basis of clustering analysis is expounded in detail,the basic idea and parameter details of swarm intelligence optirmization algorithm,especially artificial swarm energy algorithm are introduced,and their advantages and disadvantages are analyzed.In view of the data processing requirements under the current big data environment,the principle and functions of the big data processing platform Spark are introduced.(2)Then,in order to obtain higher quality clustering results,this paper improves and improves the search ability of artificial bee colony algorithm,and proposes Artificial Bee Colony Clustering Algorithm Based on Dynamic Neighborhood Disturbance Leaming.In this algorithm,In order to overcome the shortcomings of the search randomness caused by the lack of learning mechanism in the basic artificial bee colony algorithm,introduces the dynamic neighborhood,Each individual enhances the guiding nature of the search by learning from the best individuals in the neighborhood in which they are located,and also avoids the local optimal phenomenon caused by the group's optimal over-learning;In order to enhance the fineness of the search,Gaussian perturbation factor is introduced in the search,use the characteristics of the Gaussian function to ensure that the search probability within the search range is gradually reduced from near to far,and the ability of the reverse learning phenomenon enhancement algorithm to jump out of the local optimal solution is generated;In order to reduce the adverse effect of the initial population in the solution space on the algorithm search,using a small-scale elimination process during the initialization process makes the initial population more evenly distributed in the solution space;In order to improve the efficiency of the detection bee strategy,the backtracking mechanism is added to the detection bee strategy,so that the detection bee explores the new honey source and inherits the information generated in the algorithm optimization process to some extent.A certain degree of inheritance has improved the efficiency of the search bee search.Using the improved algorithm to simulate the four different real data in the UCI database,the experimental results show that the clustering results of the algorithm are improved in cluster compactness and clustering accuracy.(3)Finally,in order to reduce the time overhead generated by the clustering algorithm,the algorithm is parallelized based on the Spark parallel computing platform.The multi-node parallel computing method is used to share the time overhead generated in the fitness calculation process.Using three sets of real data of different scales for comparison experiments,the results show that the time overhead generated by the parallelization algorithm is significantly lower than that of the single machine when dealing with large-scale data.
Keywords/Search Tags:Data mining, clustering, artificial bee colony algorithm, parallel computing
PDF Full Text Request
Related items