Font Size: a A A

Research On Improvement Of Two Kinds Of Clustering Algorithms And Its Application

Posted on:2021-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:M SunFull Text:PDF
GTID:2428330602989836Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In the era of data explosion,how to effectively analyze and manage these data information becomes particularly important,and clustering analysis is an important technology for people to classify data.It can not only be used as an independent tool to preprocess data,analyze the distribution of data,understand the characteristics of various data,but also as an auxiliary means of other data mining functions.Because cluster analysis has important application value,it has been a hot spot in this research field.The ant colony clustering algorithm based on partitioning is an intelligent clustering algorithm based on the principle of ant feeding.Although it is a self-organizing,positive feedback,essentially parallel,robust optimization algorithm,it still has the disadvantage of being trapped in a local optimum.The density-based CFSFDP(Clustering by Fast Sear-ch and Find of Density Peaks)algorithm,the idea of the algorithm is simple and easy to implement,the clustering effect is excellent,but the selection of multiple density peak clustering centers is prone to wrong selection.In view of the shortcomings of the above two types of algorithms,this paper mainly proposes two improvement schemes.In view of the shortcomings of ant colony clustering algorithm that is easy to fall into local optimization,an ant colony clustering algorithm based on random search mutation strategy is proposed,which mainly performs mutation operations by randomly selecting methods to improve the search ability of the algorithm.The UCI data set is selected for numerical simulation experiments,and the results show that the proposed improved algorithm is superior to the basic ant colony clustering algorithm and k-means algorithm in the terms of objective function values F,Rand coefficient,adjusted Rand coefficient and standard mutual information partition.Finally,it proves that the improved algorithm has better optimization ability for numerical optimization problems.When there are multiple density peaks in a single cluster for the CFSFDP algorithm,it is difficult to determine the number of cluster centers using the decision graph,resulting in a poor clustering effect.It is proposed that all data points with density greater than the current position and their minimum distance from the current position are each grouped into a different set,and sort the local density obtained by the Gaussian kernel.Thus,when there are multiple density peaks,only the first point is selected as the cluster center,and the number of cluster centers is determined using the normalized value distribution map.The numerical simulation experiments of artificial data sets and UCI data sets show that the improved CFSFDP algorithm is superior to CFSFDP algorithm,DBSCAN algorithm and k-means algorithm in adjusting rand coefficient,homogeneity,completeness,V-measure,and standard mutual information score.The improved algorithm makes up for the shortcomings of the CFSFDP algorithm that cannot cluster well for multi-density peaks,and is suitable for clustering of arbitrary data sets with lower dimensions.Finally,the two improved algorithms are applied to practical problems respectively.Use the improved ant colony clustering algorithm to perfom feature analysis and clustering on all terrorist attack data samples of unknown terrorist organizations or individuals in the world in 2015 and 2016,and roughly infer the number of new terrorist organizations or individuals,for the subsequent realization of event producer "Claimed" provides the basis.The web crawler technology was used to collect the relevant data of 21 meteorological stations in Canada from 1976 to 2004.Based on the preprocessing of all the data,the improved CFSFDP algorithm cluster analysis model was used to divide all 21 stations into 6 categories,and the analysis of the spatiotemporal changes in temperature in Canada was performed with 5 years as an interdecadal from the annual.These practical applications show that the two improved algorithms have some practical value.
Keywords/Search Tags:mutation, ant colony clustering algorithm, multi-density peaks, CFSFDP algorithm, counter-terrorism, temperature
PDF Full Text Request
Related items