Font Size: a A A

Research On A Clustering Algorithm Based On Density And Hierarchy

Posted on:2018-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:H T WuFull Text:PDF
GTID:2348330563952573Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As a kind of method commonly used in data mining algorithms,clustering algorithm is got more and more attention.The Clustering By Fast Search And Find Of Density Peaks algorithm is a clustering algorithm based on the peak of the density.The density peak is obtained by drawing and observing the decision graph,and then the density peaks point are set as cluster centers,and clustering according to the cluster center.CFSFDP is a fast cluster algorithm that is easy to understand and achieve,and it is able to detect different shapes of data.However,the algorithm has two shortcomings:?1?CFSFDP algorithm may produce multiple density peaks due to man-made cut-off distance(9(?8? reason in the process of clustering,which is contrary to a density peak corresponds to a clustering center.If multiple density peaks appear in the a cluster,it will lead to occur some error when clustering.?2?The selection of clustering centers depends on the decision graphs generated in the algorithm,and users need to observe the decision graphs and then select the clustering centers artificially.This method will not only break the entire algorithm flow,and making the efficiency of the algorithm lower,but also may appear multi-select or leakage density peak problem.In order to deal with the above problems,this paper proposes a clustering algorithm based on CFSFDP algorithm which is density and hierarchical clustering algorithm.By introducing the clustering discriminant algorithm of the system evolution algorithm,we can use it to re-aggregate the initial clustering results,in a hierarchical clustering way,which is produced by improved CFSFDP algorithm,and it should be attributed to the same cluster of the objects together.The main work of this paper is as follows:1.A new clustering algorithm based on density and level is proposed.The algorithm has two stages.The first stage is based on the density-based CFSFDP algorithm to initialize the data.After the result of the aggregation,the multi-cluster in the result is aggregated by the discriminant algorithm in the system evolution algorithm.In order to determine whether the two clusters can be aggregated,the clusters of multiple clusters belonging to the same cluster are aggregated by calculating the degree of separation of the cluster-like edge regions and the degree of dispersion before and after the aggregation.2.A method of automatically obtaining the clustering center in CFSFDP algorithm based on weight difference calculation is proposed.By calculating the weights of each point in the data set and sorting them in descending order,and then calculating the weight difference of the adjacent points after the sorting,and the critical value of the last significant weight change is found based on the weight difference,and the weight is larger than the critical point as cluster centers.At the same time,in order to prevent the effect of data density on the clustering center selection,this paper sets up multiple truncation distances in order to obtain multiple clustering centers as much as possible,and then merge the centers with closer distances into one.3.In order to improve the computational efficiency of the algorithm,the aggregation discriminant algorithm of system evolution algorithm is used to estimate the clustering of clusters in hierarchical clustering.In this paper,we modifies the selection method of the adjacent and non-adjacent regions in the original algorithm,and reduces the number of times compared with the original method.By adding a nearest distance table,the number of comparison times can be reduced when the minimum average path is calculated.In order to solve the problem that the original algorithm can't deal with the aggregation problem of the slightly overlapping clusters,the change rate of the square sum of the errors before and after the clustering is taken as a supplementary method,which helps the original algorithm to solve the problem of the aggregation judgment of the slightly overlapping clusters.4.It is sometimes possible to divide a cluster into two or more in CFSFDP algorithm.In this paper,we use the improved aggregation discriminant algorithm and two linked-list arrays as the storage structure of clusters to achieve the secondary clustering for the result of the improved CFSFDP algorithm.5.Through a number of experiments show that the proposed new density-based and hierarchical clustering algorithm has a certain increase compared to the original algorithm in the accuracy.At the same time,the new algorithm is also better adaptability for a variety of shapes and different density distribution data.The final results what we desire.
Keywords/Search Tags:Clustering Analysis, CFSFDP Algorithm, System Evolutionary Aggregation Discriminant Algorithm, Hierarchical Clustering
PDF Full Text Request
Related items