| With a large number of social resources being networked and digitized,data has infiltrated all walks of life and become an important factor of production.The importance of data mining cannot be overstated as an effective means of extracting valuable information from complex data to aid production and learning.K-medoids clustering algorithm is an effective technique in data mining,which inherits the advantages of K-means algorithm and improves on the deficiencies of noise and outlier over-sensitivity,and has gained wide attention from researchers.However,K-medoids algorithm is still susceptible to the initial cluster center points and outliers.To solve these problems,this thesis proposes a K-medoids algorithm combined with Improved Artificial Bee Colony algorithm(IABCK-medoids).The main research works in this thesis are as follows:(1)To solve the shortcomings of Artificial Bee Colony algorithm(ABC),which is easy to be "premature",easy to fall into local optimum and slow convergence speed in the later stage,an Improved Artificial Bee Colony algorithm(IABC)is proposed.The method based on Tent chaotic mapping is introduced to select the initial bee colony to enhance the population diversity;In order to balance the development and exploration ability of the algorithm,introduce Gbest factors to participate in the algorithm process,add crossover operation,which effectively enhances the purpose of bees’ search.The standard test function verifies that IABC algorithm has better performance in optimization effect and convergence speed.(2)Taking advantage of the superior performance of the IABC algorithm in solving the global extreme value of the sample,and also improving the objective function of the K-medoids algorithm by introducing the Gauss Kernel Function to improve the robustness of the algorithm,the IABC algorithm is combined with the Kmedoids algorithm to propose the IABCK-medoids algorithm.Through experiments on the UCI dataset,it is proved that the IABCK-medoids algorithm has better clustering effect and is more robust.(3)In order to test the processing ability of IABCK-medoids algorithm for largescale data,the IABCK-medoids algorithm is parallelized under the Map Reduce framework.By comparing three indexes of the algorithm: Cluster Accuracy,Adjusted Rand index and speed up,it is shown that the algorithm still has good computing performance under the big data cluster environment. |