Font Size: a A A

Research On Density Peak Clustering And Its Application In Community Detection

Posted on:2020-07-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J DingFull Text:PDF
GTID:1368330599976107Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Clustering algorithm plays an important role in data mining.It could be used to detect the structure of data set without any classification information,also could be served as the precursor step of other learning program.In 2014,Alex Rodriguez and Alessandro Laio published a paper named Clustering by fast search and find of density peaks in Science.A new fast clustering algorithm depended on density and distance of points is proposed in that paper.The idea of this algorithm is that clustering centers are characterized by a higher density than their neighbors and by a relatively large distance from points with higher densities.This algorithm named density peak clustering is simple and efficient.It has attracted much attention of researchers.Density peak clustering also provides a new perspective of using clustering algorithm to deal with the problem in data mining.The purposes of our paper are to analyze the mathematical principle of density peak clustering,explore the method to select centers automatically and then apply density peak clustering to deal with community detection.The main contents of this thesis are summarized as follows:1.The mathematical theory analysis of density peak clustering is carried out for the artificial behavior of selecting the centers on decision diagram and the efficient label propagation.Selecting the centers is actually setting the distance threshold and the density threshold respectively.The selected points are the local maximum on the density distribution of data set under certain conditions.Label propagation is considered from the viewpoint of supervised learning.When the nearest neighbor which has a higher density is assigned correctly,the error rate of point's label allocation will not exceed twice the current bayesian optimal classifier error rate.2.The statistical theory is adopted to deal with the problem that density peak clustering needs to analyze the decision diagram and select the centers artificially.The generalized extremum distribution of the new proposed judgment indicator is estimated by the maximum likelihood estimation,and then an upper bound is obtained.The points will be regarded as the centers automatically when their judgment indicator is larger than the upper bound.Taking the computational complexity into account,an alternative method based on Chebyshev inequality is also given.Experiments on 20 numeric data sets show the effectiveness of these two methods.3.Density peak clustering is applied in community detection.According to the characteristics of community centers,a method to measure the density of node is proposed.To overcome the ‘‘Domino Effect'' of density peak clustering,that is,a faulted node will cause the subsequent nodes to be misclassified,the two-stage label propagation is proposed.After the community centers are selected,the algorithm will form the seed regions and then propagate labels according to the neighbors of nodes.Performance of the proposed algorithm is verified by experiments.4.A new distance called relational distance is proposed to deal with the complex connection of centers such as "rich-clubs" in the network.This method considers not only the connection between nodes,also the connection of their first-order neighbors.In order to further improve the accuracy,a multi-algorithm collaboration framework is introduced.After selecting the community centers,the labels of centers are propagated multiple times randomly.All of the results are merged to obtain the final community structure.Mathematical analysis and experimental comparisons show the effectiveness of the algorithm.Finally,the main research results of this thesis are summarized and the future research proposals including multi-algorithm collaboration framework for compressed sensing are suggested.
Keywords/Search Tags:data mining, density peak clustering, community detection, multi-algorithm collaboration framework
PDF Full Text Request
Related items