Font Size: a A A

Improvement Of Density Peak Clustering Algorithm And Its Application

Posted on:2023-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z W ZhanFull Text:PDF
GTID:2568306800960779Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Data is the invisible asset of every industry in the new era.Mining the information contained in data can help the rapid development of the industry and the rapid transformation and upgrading of the industry.Clustering algorithm is an unsupervised data analysis tool,which explores the similarity between the structural information of the data set and the data samples,and effectively processes massive data by classifying the data samples.Therefore,various fields are widely used,such as pattern recognition,image analysis,information extraction,data compression and network security.Peak density clustering(DPC)is a dension-based clustering method.The algorithm combines the local density and relative distance of data sample points,which can not only divide arbitrary data sets,but also propose a novel strategy to allocate the remaining data sample points,and eliminate outliers.However,it has disadvantages: first,truncation distance depends on artificial setting;second,the existing calculation method of local density is not reasonable enough;third,the measurement method of data sample distance based on simple geometric distance has disadvantages.Based on this,the main research contents of this paper are as follows:(1)Research on multi-density peak clustering algorithm based on L1 norm.In order to solve the problem that DPC algorithm may recognize the close adjacent clusters as a class,this paper introduces the concept of suspected cluster center into the original density peak clustering and studies the multi-density peak clustering algorithm based on L1 norm.In this algorithm,the decision value of logarithmic points of probability distribution is used to select clustering centers and assign data points to them.Finally,Warshall algorithm is used to merge suspected clusters to obtain the final clustering result.(2)Research on density peak clustering algorithm based on random walk.In view of the original DPC algorithm which only uses simple Euclidian distance to describe the distance between data points,this paper discusses the density peak clustering algorithm based on random walk,which uses random walk to describe the distance between data points and reflects the structural distribution of data set.The experimental results show that the density peak clustering algorithm based on random walk is effective in clustering data sets with wide distribution of data clusters.(3)Research on continuous attribute discretization based on density peak clustering algorithm.Unsupervised is the characteristic of density peak clustering algorithm,so density peak clustering algorithm can be applied to the discretization of continuous attributes.In this paper,a discretization of continuous attributes based on density peak clustering algorithm is carried out.Density peak clustering algorithm does not need iteration in discretization of continuous attributes.Compared with the traditional method,it can shorten the time and improve the efficiency of the algorithm.Experiments verify its discretization effect.
Keywords/Search Tags:Clustering, peak density, random walk, unsupervised, discretization
PDF Full Text Request
Related items