Font Size: a A A

Research On Improved Hierarchical Clustering Algorithm Based On Density

Posted on:2017-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:2308330503961510Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Data mining is an important tool to help us find valuable information from the massive data, and cluster analysis is an important research direction of data mining. Cluster analys is has been successfully applied in the fields of biology, statistics, machine learning and business decision etc. The current clustering algorithms are targeted and the research on the more efficient, more accurate and more comprehensive clustering algor ithm is still a hot topic. Hierarchical clustering is an important embranchment of clustering analysis. In this paper, we focus the research on the hierarchical clustering algorithm and compare the clustering performance of some representative hierarchical clustering algor ithms.CURE algorithm is a typical hierarchical clustering algorithm. This algorithm is sensitive to shrinkage factor, and it is difficult to define the noise and isolated points. Aiming at the shortcomings of CURE algor ithm, we propose an improved hierarchical clustering algorithm based on density hierarchy. The improved algorithm sorts the points of the dataset according to the density, and removes the about 10% points with the minimum density that is considered the deviation points. These deviation points include the noise and isolated points. The remaining points are stratified according to the density, and the hierarchical clustering is carried out on two layers with maximum and minimum density. On the basis of the results of hierarchical clustering, all the remaining points are clustered. In the end, the deviation points are divided into the closest class which has been gathered. The improved algorithm is not sensitive to noise and outliers, and don’t need shrinkage factor parameters. The improved algorithm has good clustering effect on many kinds of non spherical clusters. Experiments show that the clustering performance and efficiency of the improved algorithm is better than CURE algor ithm.In addition, th is paper also analyzes a new CBDP algorithm based on density. This algorithm is proposed by Alex Rodriguez in a paper of Cluster ing by Fast Search and Find of Density Peaks in 2014. The CBDP algorithm can only deal with the class of uneven data distribution and the dataset that the data density gap of different classes is not large. Aiming at the shortcomings of CBDP algorithm, we propose an improved clustering algorithm. The improved algorithm can eliminate the noise and isolated points by calculating the density and distance of the data, so that the improved algor ithm is not sensitive to noise and outliers. We can determine the density peak by drawing the product distribution curve of density and distance. Then we calculate the minimum distance between two classes by using the density peak point as the center. According to this distance, we merge the classes until the number of clusters is reached. Experiments on datasets show that the clustering result of the improved algorithm is obvious ly better than CBDP algorithm, and the clustering effect is more stable.
Keywords/Search Tags:Cluster analysis, Hierarchical clustering, CURE algor ithm, CBDP algor ithm
PDF Full Text Request
Related items