| In the era of big data,every industry generates huge amounts of data.How to mine effective information from these data has become a huge challenge we face.Efficient clustering algorithm is not only the foundation of data mining,but also an important method for extracting useful information.It is also a current hot research topic.In June 2014,an article entitled "Clustering by fast search and find of density peaks" was published in "Science" magazine.The paper introduced a new clustering algorithm-density peak clustering algorithm,the algorithm It is a simple and efficient clustering algorithm,which has been successfully applied in many fields.However,the algorithm also has the following problems:(1)when calculating the local density,the truncation distance needs to be determined according to the experience of the researcher;(2)the calculation method of the local density of the sample is too simple;(3)the use of the remaining sample points It is a "one-step" allocation mechanism that has a higher probability of erroneous allocation.This paper has conducted in-depth research on the shortcomings of the density peak clustering algorithm,and optimized and improved on this basis.The specific research content and research results are as follows:1.A physics optimized density peak clustering algoritlum(W-CFSDPC algorithm)is proposed.The main purpose of this algorithm is to improve the clustering quality as much as possible.First of all,under the inspiration of physics,the density peak clustering algorithm is re-analyzed from the perspective of mechanics,so that it fully reflects the aggregation and dispersion of data generated spontaneously,and achieves the intersection and fusion between different disciplines.The idea is that all things have a certain interaction force,and the sample points of data sets of different sizes also have a certain force.The local density of the sample is redefined according to the law of gravity,and the sample point is considered to the greatest extent.Surrounding environment;then,using the definition of the first cosmic velocity,it is improved to calculate the "force" between the remaining sample points and the center of mass of each cluster,in order to divide the remaining sample points into points that must belong to points and possible points In these two cases,different situations use different allocation mechanisms.Finally,the algorithm in this paper is compared with the five clustering algorithms.The numerical test results show that the algorithm in this paper is a good clustering algorithm,which can not only accurately identify clusters The position of the center of mass,and the allocation of the remaining sample points is more accurate.2.A density peak clustering algorithm(F-CFSDPC algorithm)based on the best neighbor variance balance is proposed.First,the algorithm is inspired by the best neighbor variance balancing method,by studying the equilibrium state of the data points,on this basis,the idea of the best neighbor is integrated into the density peak clustering algorithm,using the best neighbor variance The balance method adaptively determines the cutoff distance to reduce the uncertainty caused by human factors;then,the tree is constructed according to the best neighbors of the data points,and the construction of the tree and the decomposition tree are used to achieve rapid and accurate allocation of the remaining sample points;finally,The F-CFSDPC algorithm is compared with the SNN-CFSDPC algorithm,FKNN-CFSDPC algorithm,CFSDPC algorithm,etc.The experimental results show that the F-CFSDPC algorithm has improved F-measure and FMI indicators compared with the rest of the algorithms,clustering The quality is better. |