| In the era of big data,a large number of data with complex structure and large scale are produced,such as social data,telecom user data,online e-commerce user data and paper link data.These data can be modeled as networks,and the discovery of network clustering structure rules has become one of the current research hotspots.Researchers have proposed a large number of large-scale community discovery algorithms based on spark,but they can not find the non community structure in the network.The outline graph model can find all kinds of structures in the network,but its operation efficiency and network processing scale are limited.The probability graph model based on spark can efficiently deal with the problem of large-scale network structure discovery.The discovered network structure can keep the global structure in the network,and make up for the problem of ignoring the global structure in network representation learning.The community preserving network representation learning method not only maintains the first-order proximity of nodes,but also considers the global topology.Therefore,large scale network representation learning and network structure discovery have important research value.At present,some researchers have proposed some network structure discovery methods for community discovery,but the selected network nodes are invalid for the network structure discovery with mixed mode,and the data processing efficiency is very poor for large-scale network.The network with mixed mode may not have community structure,or other clustering structures,such as binary structure,star structure and the mixture of multiple structures.Therefore,it is necessary to design a large-scale network structure discovery algorithm to improve the performance of network structure discovery with multiple clustering patterns.Network representation learning can learn the representation of network nodes,and use the learned representation in the later community discovery task.When analyzing the results of community discovery,the community structure can be represented,and the node representation and community representation can be combined to enhance the performance of network representation learning algorithm.This paper mainly completes the following research contents:(1)Aiming at the hybrid structure oriented network structure discovery algorithm NMM(Newman Mixture Model)algorithm,which can not effectively discover the network hybrid structure on large-scale network data set under the traditional single-machine serial computing,a large-scale network structure exploration algorithm on Spark LNSES(Large scale network structure exploring algorithm on Spark)is proposed.The algorithm is based on spark platform,and takes advantage of Spark’s own distributed computing and storage advantages to improve the algorithm in storage space and running time.The experimental results show that the lnses algorithm is better than other similar network structure discovery algorithms in terms of running time and network structure discovery accuracy.(2)Aiming at the problem that vgraph(A Generative Model)algorithm does not make full use of the node similarity,this paper proposes a network representation learning algorithm NFVgraph(Node Feature Vgraph)which integrates the characteristics of network nodes.The algorithm uses the distribution similarity between network nodes.Firstly,the network nodes are represented according to the distribution hypothesis.Using the similarity of nodes themselves and the representation of community structure in Vgraph algorithm,the loss of computing node distribution similarity is added into the objective function.Finally,the optimal node representation and community representation are obtained by iterative calculation of the objective function.Experimental results show that nfvgraph algorithm is better than vgraph algorithm in subsequent data analysis tasks. |