Font Size: a A A

Incomplete Data And Heterogeneous Community Mining Based On Density Peaks

Posted on:2022-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:K GaoFull Text:PDF
GTID:2518306776492564Subject:Management Science
Abstract/Summary:PDF Full Text Request
Density clustering is widely used in pattern recognition,information retrieval,image analysis,complex network analysis and many other fields to identify the hidden structure of real-world data sets.The density peak algorithm can only deal with structured and complete data,and its performance is poor in many cases.First,the data in the real world often contain missing or wrong values.For such incomplete datasets,a usual method is to impute the data,and then apply traditional clustering method on them,which leads to the decline of accuracy,and the‘aggregation phenomenon'of imputed data points may lead to the failure of density peak clustering.Second,for common semi-structured data,it is often modeled as a complex network.Community mining is an important technology to discover the hidden structure of complex network.Clustering algorithm has been used to deal with the problem of community mining for a long time.At present,the research of community mining mainly focuses on homogeneous networks,while the research of heterogeneous networks is not enough.This paper combines the density peak clustering algorithm with the above two problem scenarios to study incomplete data clustering and heterogeneous network community mining based on density peak algorithm.The main contributions of this paper include the following aspects:· To verify and analyze the defects of density peak clustering algorithm by experiments,this paper studies the difficulty of using density peak algorithm to solve incomplete data clustering and the difficulty of using density peak algorithm to solve the problem of community mining in heterogeneous information network.· For the incomplete data scenario,this paper discover the 'aggregation phenomenon' of imputed points,that is,the aggregation of imputed points will mislead the algorithm to choose the wrong clustering center,resulting in poor clustering effect.This paper proposes the idea of combining clustering and classification to solve the problem of clustering,and DPC-INCOM algorithm is designed,which is good at dealing with the clustering of incomplete data sets with arbitrary shape.Experiments on benchmark data sets show that the clustering performance of DPC-INCOM algorithm is more stable and the accuracy of the proposed algorithm is better than that of the original density peak algorithm,which outperformed the best imputation-based method by up to 39.3%.· For heterogeneous network community mining,some researchers have combined the density peak algorithm with homogeneous network,and their algorithm has more advantages than other homogeneous graph community mining algorithms.In this paper,the density peak algorithm is combined with heterogeneous network community mining in order to obtain the same advantages.We propose the DPCHIN algorithm,and design a node similarity calculation method based on weighted metapath,prove the advantages of multi metapath by experiments,design a calculation method of local density and distance in heterogeneous network,and reassign the community label of some nodes through community merging and neighbor vector to improve the quality of community.Experimental results on four real heterogeneous information network datasets show the effectiveness of the proposed algorithm.
Keywords/Search Tags:Density Peaks, Incomplete Data, Heterogeneous Information Networks, Community Discovery, Clustering
PDF Full Text Request
Related items