Ant colony clustering algorithm, as a classical clustering algorithm, is widely used in many field because of its robustness strongly and combine with other algorithm easily. With the rapid development of internet and the sharp increase of data volume, traditional ant colony clustering algorithm has to face new challenges when dealing with big volume data such as out of memory, cannot reflect full advantages of ant colony’s parallel computing, cannot deal with distribute data and so on. MapReduce computing framework, which is proposed by Google, is one of the selectable solutions. A lot of people are doing their clustering algorithm research base on MapReduce computing framework, and achieve good results. So it is important to designing and optimizing traditional ant colony clustering algorithm base on MapReduce computing framework.In the paper, first, we show a detailed summarize and introduction of clustering analysis algorithm, analyze and summarize common clustering analysis algorithm from data mining to classification of clustering analysis algorithm and ACOC clustering algorithm. Second, we analyze the advantages and disadvantages of ACOC clustering algorithm. Third, we summarize the MapReduce computing framework and introduce the work of set up Hadoop computing platform which is used in this paper’s experiment. Fourth, we propose a parallel ant colony optimization clustering algorithm MR-ACOC based on MapReduce. The proposed algorithm can not only solve the problem of big data but also take full advantage of ant colony algorithm by combining the search space replication approach with the search space partition approach, and it can read pheromone and dataset line-by-line to avoid out of memory when dealing with large datasets. Fifth, we optimize the MR-ACOC algorithm from the point of split data cluster and center computing module, insert combiner function between objective function computing module and centre point computing module, change HDFS data source to Hbase data source and so on. The experiment results show that MR-ACOC algorithm has good scaleup and speedup when dealing with big volume data. |