Network data is a data structure consisting of a collection of nodes and a series of edges.Nodes represent an entity or object,and edges represent relationships or connections between nodes.Network data plays an important role in modern technology,and nonlinear data related to many domains can be represented as graph data.Communities are defined as densely connected subgraphs in a graph,where nodes inside the subgraph are more closely linked to each other than outside,and less connected to outside nodes.Community discovery is a method to identify community structure in a graph,aiming to identify community structure and assign nodes to different communities.With the development of Internet and big data,community detection occupies an important role in the data mining task of graphs.The processing of nonlinear data also poses higher requirements.In this paper,we conduct a relevant study on the community discovery problem of static networks and propose a distance matrix based on topological scoring mechanism using the topology of the data,which enables the node connection tightness to be applied on the community discovery task and also combined with the density peak clustering algorithm to solve the community discovery task.In addition,graph representation learning tasks are often used to solve community discovery problems for large-scale sparse networks.Based on the node vectors output by graph representation learning,the fitness of the clustering algorithm to the graph representation learning algorithm for the community discovery problem is improved to enhance the community discovery related metrics.Details are as follows:1.a Topology-based and Improved Density Peaking Algorithm(TIDPC)based on a topology scoring mechanism is proposed to solve the non-overlapping community discovery problem.The algorithm redefines the similarity matrix and considers the topology of the graph dataset itself,based on the reachability matrix between nodes,and fully incorporates the higher-order information of nodes.The algorithm performs a normalization operation on the adjacency matrix and the reachability matrix to generate the similarity matrix.Based on the original definition of local density based on truncation distance,the degree of nodes is incorporated into the calculation to improve the accuracy of the node community belonging to the algorithm.The experimental results on both artificial and real-world datasets show that the TIDPC algorithm is able to effectively classify static non-overlapping communities with strong robustness and accuracy when dealing with both artificially generated networks and real network data.2.A Density Peaking Algorithm for Improved Local Density(ILDPC)with adaptive node vector distribution is proposed to solve the community discovery algorithm based on graph representation learning.The existing graph representation learning and clustering algorithms are divided to solve the community discovery problem.Therefore,the distribution of output nodes of graph representation learning algorithm is unpredictable.To address these problems,the algorithm improves the local density to fit the low-dimensional vector distribution of the network output,and then clusters the nodes.Based on this,an adaptive truncation distance algorithm is proposed,which automatically selects the truncation distance as the input of the algorithm based on the analysis of the distance matrix formed by the existing vector distribution.The algorithm is fused with the Graph SAGE model for testing and experimenting on artificial and real-world datasets.Decision plots show that the algorithm is able to classify the peak points of clusters more effectively,and by using F1-scores shows that the algorithm has a greater advantage over other clustering algorithms in dealing with community discovery. |