| The advent of the big data era has led to the continuous presentation of complex network structures,which are highly abstracted from complex systems and are crucial in studying the topology of networks as well as the properties of nodes.Complex networks can be represented by a graph model constructed by nodes and edges.Small-worldness,scale-free and community structuredness are its most important features,where community structuredness expresses the aggregated group properties of nodes in the network,and the mining of this property,that is,community discovery,is community clustering through the degree of close connection between nodes and neighboring nodes,where the same community is closely connected to each other while different community nodes are relatively weakly connected.Community discovery is essential for understanding the function and organization of complex network systems.Due to the increasingly complex network structure,overlapping community discovery is of great practical importance.The algorithm is optimized and improved based on the more classical COPRA community discovery algorithm.For the random selection of initial nodes and the randomness in label update in the COPRA overlapping community discovery algorithm,taking into full consideration the individual attributes of nodes and the association metric between nodes,this thesis proposes an NI-COPRA algorithm that integrates the influence of nodes.Firstly,we propose to use En Renew algorithm to measure node importance,which is based on information entropy to determine node label update order;secondly,we design a node influence measure that integrates node importance and similarity to further determine community affiliation coefficient and perform label propagation;finally,we implement overlapping community discovery based on node labels.Firstly,Some traditional importance ranking algorithms only consider the degree of nodes or only first-order neighbors,and for the problem of randomness in selecting the initial nodes in the label propagation COPRA algorithm,this thesis proposes to use the En Renew importance ranking algorithm as a measure of node importance,through which the importance ranking of nodes in the network is obtained and the propagation path of nodes is obtained,and then the SIR model is Experimental verification is conducted,which can confirm that this importance ranking algorithm has certain advantages in network propagation.Secondly,based on the randomness problem of label delivery in COPRA algorithm,this thesis introduces the concept of node influence.The definition of node influence fully considers the importance of nodes,that is,the individual attributes of nodes.At the same time,it also introduces the correlation between nodes,that is,the similarity of nodes.The similarity of nodes in this thesis is generated by the Node2 vec model to generate the sequence of nodes,and then uses the Skip-gram model to train the target sequence.Thus,the representation of the vector is obtained,and then the similarity measurement is obtained through the cosine similarity.Finally,Through the above definition of node influence,node influence is added to the node affiliation coefficients,and the final overlapping communities are obtained through the selection of labels by the affiliation coefficients.In this thesis,nine real network datasets and two groups of artificially generated network datasets are selected.Firstly,the vote algorithm,k-shell algorithm and degree algorithm are used as comparative experiments on six real networks to verify the effectiveness of importance ranking algorithm.Good experimental results are shown on the six datasets.Secondly,the traditional COPRA algorithm and LPANNI algorithm are used as comparison experiments on nine real networks and two groups of artificially generated networks for community discovery validation.The experimental results show that the algorithm outperforms other algorithms in two categories of EQ and NMI,and improves the overlapping community discovery accuracy,which further proving the excellent performance of the algorithm NI-COPRA proposed in this thesis. |