Font Size: a A A

Research On Density Peaks Clustering Algorithm For Complex Data

Posted on:2022-09-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X XuFull Text:PDF
GTID:1488306533468474Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Density peaks clustering(DPC)is a density-based clustering algorithm that quickly identifies cluster centers for clustering by mapping data objects to a twodimensional decision graph.Since it was proposed in 2014,the DPC algorithm has attracted increasing attention in various clustering applications.DPC can effectively handle non-spherical datasets and identify outliers without pre-setting the number of cluster centers,without iteration,and with only one input parameter.However,the theory of DPC is immature and suffers from two main challenges when facing complex data:(1)The traditional DPC algorithm has high computational complexity caused by similarity matrix construction,which is unsuitable for data with large sample size.(2)The traditional DPC algorithm has poor robustness and cannot handle data with multiple density peaks structure and data with higher feature dimensions.To address the above problems,this thesis deeply analyzes DPC's shortcomings for large-scale data,multi-density peaks data,high-dimensional data,and conducts a systematic study on it.The specific research contents are as follows:1.Research on sampling method for density peaks clustering.To deal with the high computational complexity problem in the traditional DPC algorithm for processing large-scale data,this thesis proposes fast density peaks clustering based on prescreening.Specifically,we design two pre-screening strategies based on grid-division and circle-division to screen data objects with higher local-density,which provides the universal sampling method for density peaks clustering.Then,two fast density peaks clustering algorithms are proposed,which quickly identify cluster centers in data objects with higher local-density to effectively reduce the computational complexity.2.Research on density peaks clustering for large-scale data.The traditional DPC algorithm has high computational complexity due to the construction of the similarity matrix.Although methods have been proposed to effectively reduce the computational complexity,they affect the clustering precision and introduce complex parameters.To balance clustering precision and computational complexity,this thesis proposes fast density peaks clustering based on sparse search.Specifically,we design a sparse search strategy based on dissimilarity to search for nearest neighbors.Then,a density peaks clustering algorithm for large-scale data is proposed,which clusters by only measuring the nearest neighbors' similarity to balance clustering precision and computational complexity.3.Research on density peaks clustering for multi-density peaks data.To deal with the problem that the traditional DPC algorithm cannot obtain the ideal partition of the multi-density peaks data,this thesis proposes density peaks clustering based on a feedback strategy.Specifically,we design a feedback strategy to merge sub-clusters based on the support vectors.Then,a density peaks clustering algorithm for multidensity peaks data is proposed,which mainly includes two steps: dividing sub-clusters and merging sub-clusters.This algorithm can greatly reduce the impact of cluster centers and improve multi-density peaks data clustering.4.Research on deep density clustering for high-dimensional data.To deal with the problem that the traditional DPC algorithm cannot handle high-dimensional data,this thesis proposes semi-supervised deep density clustering.Specifically,we take convolutional autoencoders to extract data features and design a semi-supervised density peaks clustering algorithm to identify stable cluster centers.Then,a joint clustering loss is defined by integrating a little prior information to simultaneously perform feature representation and cluster assignment,which can improve highdimensional data clustering.We perform numerous experiments on multi-type datasets to illustrate that this thesis proposes an efficient and robust density peaks clustering algorithm system,which enriches the research content of clustering analysis,and provides theory and technical support for image recognition.The thesis has 49 figures,24 tables,and 174 references.
Keywords/Search Tags:clustering, density peaks clustering, deep clustering, algorithm complexity, algorithm robustness
PDF Full Text Request
Related items