Font Size: a A A

Research On Manifold-based Density Peaks Clustering Algorithm

Posted on:2017-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y S JuFull Text:PDF
GTID:2308330488495180Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering algorithm is of great importance for its wide-range applications. For example, it has been widely used in data mining, pattern recognition, image processing, data compression and many other fields. It assigns the samples into different clusters according to their similarities, the differences of the samples in the same cluster should be as small as possible while being as large as possible in different clusters.Density-based clustering algorithms take the density in the spatial distribution as the basis for clustering. It regards the clusters as high-density areas separated by low-density areas in the data space, and when the density of a sample is greater than the specified threshold value, we should add this sample into the similar cluster. With the development of data mining and machine learning, many researchers have proposed various density-based clustering algorithms.We mainly research on the latest density-based clustering algorithm, named Clustering by Fast Search and Find of Density Peaks, and we have made some necessary improvements on it. In addition, we extend it to manifold space and apply it to the evolutionary data. The main research work and results are as follows:(1) We have presented geodesic density peaks clustering algorithm. The original algorithm needs to provide a distance matrix as the similarity matrix for the input, however, for different industries and applications, different calculation methods of the distance has great impact on the final clustering results, so we choose geodesic distance which can reflect the most actual distance relationships between each sample in comprehensive consideration as a unified standard. In addition, in the original algorithm, it requires user to drag a rectangle to choose the centroids manually with the mouse that is inconvenient and unfair, so we adopt the way of identifying the centroids automatically according to the number of clusters that improves the efficiency of the algorithm.(2) Manifold density peaks clustering algorithm is proposed. For the unsatisfactory result of original algorithm in processing high-dimensional datasets, isometric mapping is introduced to map high dimensional datasets into lower dimension to achieve the goal of dimensionality reduction. Meanwhile, we introduce non-negative matrix factorization to compare with isometric mapping, and the experimental results show that isometric mapping that based on manifold is more suitable.(3) In order to deal with evolutionary data, we extend our manifold density peaks clustering algorithm into evolutionary environment. In view of network applications emerging in endlessly, large amounts of data are generated every second, which results in more intensive attention to the timely and efficient analyzing and processing of these data.
Keywords/Search Tags:clustering, density peaks, geodesic distance, isometric mapping, non-negative matrix factorization, evolutionary data
PDF Full Text Request
Related items