| In recent years,with the rapid development of Internet Technology,massive amounts of data with mutual relations have been constantly emerged.As an abstraction of real complex systems,complex network uses network science to analyze the associations between individuals and system structures.As one of the important characteristics of complex networks,community structure aims to discover a set of closely connected nodes in the network,so that we can understand the topology structure in the network more clearly.The low-dimensional vector representation of the node can be got based on the network embedding method,which can reduce the space cost of the algorithm.The introduction of higher-order information of the network or community embedding can further improve the performance of community detection algorithm on this basis.In the existing detection algorithm,when using the multi-level information of the network,there is a problem of quantifying the weight of each relationship,or when the community embedding as a "supervisory information" to guide the node embedding,the network homogeneity is not considered.Because of these problems,this thesis focuses on the quantification of the weight of each level relationship and the application of network homogeneity.The main research contents and achievements are as follows:For the problem of quantifying the weight of multi-level information in the network,the attention mechanism is introduced to learn the weight of each level of information,and a community detection algorithm based on the network higher-order relation matrix is designed.The number of paths between nodes with different path lengths is designed into each order relationship matrix of the network.We learn the context distribution that consistent with the network topology as a weight combination based on the attention mechanism,and use weighted summation to build the higher-order relationship matrix of the network.The objective function introduces community embedding and modularity,we use multiplicative update rules,and updates parameters alternately and iteratively.After obtaining the optimal parameter combination through the three datasets,the performance of this algorithm improves by 5.51% compared with the optimal Deep Walk performance in the basic algorithm.For the situation of neglecting network homogeneity,homogeneity is introduced in the joint optimization of node embedding and community embedding,and a self-clustering algorithm based on node embedding and community embedding was designed.The community is regarded as a latent variable,and the node-community distribution and the community-node distribution are introduced.The Variational Boundary and Monte Carlo optimize the loss function to obtain the distribution relationship of nodes on the community,and then the specific partition of every node is obtained based on the Gumbel Softmax Trick.In order to meet the homogeneity of the network,the weight of connected edges is measured as the similarity relationship of metric nodes in link prediction,and to smooth the nodes’ expressions between the connected nodes each other.By using node embedding clustering to get the community center,we introduce the Center Loss that is the average distance between the node and the corresponding category center to achieve the self-clustering process of the algorithm.After calculation and comparison,this algorithm improves by 3.21% compared with the optimal Deep Walk performance in the basic algorithm.In order to verify the ability of the above algorithm to detect communities in the network,this thesis selects six datasets and eight basic algorithms for evaluation.The index Louvain’s Rate is defined based on the modularity,and the Deep Walk algorithm is used to verify the necessity of using network higher-order information to detect performance.The experimental results show that the community detection algorithm based on the attention mechanism high-order relation matrix and the self-clustering algorithm based on node embedding and community embedding have significant advantages over the basic algorithm as a whole on datasets. |