| Clustering algorithm has become an indispensable part of data mining.With the rapid development of information technology,the research and improvement of clustering algorithm is more in-depth.The clustering algorithm,as the name suggests,divides the sample set according to its own characteristics(or attributes)and specified metrics.It is an unsupervised learning method,and it is not necessary to know the attribution of each sample point in advance.Traditional clustering algorithms need to set the number of clusters,initial clustering center,threshold and other related parameters in advance,and the setting or selection of these parameters is determined by experience(that is,there is no clear standard),and to a large extent,different parameters affect the results of clustering,which makes it difficult to perform in actual operation,and also causes instability of clustering results.Therefore,it is especially important to improve the adaptability of the clustering algorithm so that it only depends on the characteristics of the sample objects.In this paper,two adaptive improvement methods are proposed for the shortcomings of evolving clustering algorithm and fuzzy clustering algorithm,which greatly improve the stability and accuracy of the original clustering algorithm.(1)Evolving clustering algorithm is an evolving online clustering method,which can be real-time in increasing the number of clusters and adjusting the clustering center and clustering radius to obtain the best clustering result.However,the traditional evolving clustering algorithm needs to set the threshold in advance.Without the prior condition of the dataset,the selection of the threshold is difficult to determine,and different thresholds will greatly affect the final clustering effect;meanwhile the evolving clustering algorithm is sensitive to the input order of sample points.Aiming at the shortcomings of traditional evolving clustering algorithm,this paper proposes an improved adaptive evolving clustering algorithm,which first pre-clusters several data sample points to obtain initial clustering results;the clustering center and the clustering radius are adjusted according to the new sample points;finally,it selects the segmentation or fusion according to the classification,and adjusts the clustering result again to obtain the best clustering result.The algorithm significantly improves the dependence of clustering results on the selection of threshold and the input order of sample point,and is more suitable for dealing with actual data classification problems.The experimental results show that the improved adaptiveclustering algorithm can achieve dynamic online clustering,which significantly improves the accuracy and stability of clustering results.(2)Fuzzy C-means clustering algorithm is a widely used clustering algorithm,and it classifies according to the membership degree between each sample point and clusters,which breaks the case that the membership degree is 1 or 0 in traditional clustering algorithm.However,the traditional fuzzy clustering algorithm needs to set the parameters and the initial clustering center in advance.Selecting different parameters and the initial clustering center will greatly affect the final clustering result.Aiming at the shortcomings of traditional fuzzy clustering algorithm,this paper proposes a dynamic fuzzy clustering algorithm based on weight difference.Firstly,the concepts of sample feature weight vector and sample-to-sample difference are introduced to describe the distribution of data set,and the candidate clustering center is obtained by using new evaluation index.Then,the remaining sample points are classified according to the minimum difference criterion.Finally,the candidate clustering centers are further screened and combined with the Davies-Bouldin Index evaluation criteria.The algorithm dynamically determines the number of clusters and the initial clustering center according to the spatial distribution of each sample point in the specific dataset,and considers the inter-class dispersion and intra-class dispersion,which effectively reduce the influence of selecting initial clustering center randomly.The experimental results show that the performance of the algorithm in testing different data sets is significantly better than the traditional clustering algorithm,which has higher adaptability and stability. |