Font Size: a A A

Approaches To Exploring General Communities In Massive Networks Based On Generative Models

Posted on:2016-06-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:B F ChaiFull Text:PDF
GTID:1228330470955945Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the growth of the Internet technology and digitization technology, a number of social networks emerge, such as Twitter, LinkedIn, Facebook, Sina Microblog, Ren Ren, which become ubiquitous platforms of our everyday life, work and learning. These online social networks generate massive complexity contents and links every day. The nature structures and interaction rules of social networks underlie these linked data. Content data provide rich texts, images and other kinds of multimedia data for data min-ing. Exploring and analyzing latent structures of these networks provide unprecedented challenges and opportunities for different areas.Traditional community detection is the most popular approach to latent structure ex-ploring and analyzing in current. It assumes that there is only the community structure with dense internal connections and sparse external ones in a network. However, people have little prior about whether there is community structure or there are other types of structures underlying online social networks. Thus, traditional community detection may be invalid when there is no community or there are other types of structures. Stochastic block model (SBM) for network structural exploration is able to explore network struc-ture (such as traditional community structure, bipartite structure, star structure et al) by modeling networks with a wide variety of structure. This type of structure assumes nodes belong to a cluster if they have the same link probability with other nodes. The structure include traditional community structure. The similarity of nodes in a cluster defined by these probabilistic models is more general than the definition of traditional community with dense links in the same cluster and sparse links between two clusters, bipartite struc-ture with sparse links in the same cluster and dense links between two clusters, and other complex structure. This kind of structure is named as’general structure’. The existing probabilistic models and parameter estimating methods of general community exploring are not enough efficient to explore the structure underlying massive complex network-s. Thus, it is necessary to study approaches to general community detection based on probabilistic generative models. This paper focuses on how to model the features of net-works and how to design efficient and accurate parameter estimating methods. The main contributions of this dissertation are set as follows:1)A fast algorithm for general community exploring based on general stochastic block model (FGSB) is provided. The GSB (general stochastic block) model is provided to detect general structure based on the idea of link community. But its complexity limits its applications in medium and large networks. In order to improve the running efficiency of parameter estimating algorithm of GSB, a fast algorithm on the GSB model (FGSB) is designed to explore general structure in networks faster. FGSB dynamically learns the parameters related to the network structure in the process of iterations according to two strategies. It reduces the storage memory by reorganizing parameters to cut down unnecessary parameters, and saves the running time by pruning the related parameters of converging nodes and edges to decrease the computing time of each iteration. Experi-ments illustrate that FGSB has the same ability of structure detection as the GSB model, but its complexities of time and storage are lower.2) A Productivity Popularity Stochastic Block model (PPSB) is provided for general community exploring, which is then combined with Discriminative Content model(PPSB-DC) for general community exploring on text-associated networks. The existed stochastic block model can not model power-law distribution of nodes, over-lapping of nodes and general community coherently, making it fit real networks with power-law degree distribution badly. Thus, the PPSB model considers four factors in the generative process of a network, including node productivity, node popularity, node mix-memberships and link probability among communities. The PPSB model has the flexibility in discovering general community structures and modeling the real scale-free networks with power law degree distributions. These advantages improve the accuracy of general structure exploration. In order to make use of rich texts of networks, we combine the topology structure and content information and provide the combined model PPSB-DC. The PPSB-DC model improves the accuracy of general structure exploration better. Experiments demonstrate the PPSB model and the PPSB-DC model are valid.3)A three-level hierarchical Bayesian model for general community exploration (Generalized PPSB) is presented. Then a parameter estimating algorithm based on s-tochastic variational algorithm is developed. Although the PPSB model can model gen-eral structures of networks and the power-law degree distribution simultaneously, it pro-vides no generative model for the membership of nodes and link probability between clusters. This makes the PPSB model easy over-fitting as the training data increases, and be difficult for link prediction on entities of a network outside of the training set. In ad-dition, the existing parameter estimating algorithm is unable to deal with large networks. We design a three-level hierarchical Bayesian model GPPSB, which draws the prior gen- erative models of memberships of nodes and link probabilities between clusters. Then a scalable algorithm based on stochastic variational inference is developed for posterior inference. Compared with the state-of-the-art probabilistic approaches, our algorithm explores general structure faster and better.4)An online EM algorithm Online-VEM(online Variational EM) for general struc-ture exploration based on mixture models is provided. On the one hand, the existed online EM algorithm OEM based on the SBM model can explore general structure efficiently, but its complexity (O(mK2), where m is the number of edges and K is the number of clus-ter) is high when there are many clusters in a network. On the other hand, the complexity on the number of community of mixture models NMM and its variant ENMM (O(mK)) is linear, but its EM algorithm runs on the whole network at each iteration, making this algorithm unable to deal with massive networks efficiently. An online variational EM algorithm based on ENMM (Online-VEM) is presented. The algorithm first estimates the posteriors of cluster membership of a new added node, then it updates current model parameters in terms of parameters from the previous iteration and the posteriors of the new node. Compared to the batch EM algorithm of mixture models and OEM, our online algorithm costs less with little or no degradation of performance for structure exploratory.
Keywords/Search Tags:social network analysis, massive networks, stochastic block model, s-tochastic variational inference, online algorithm, general community de-tection
PDF Full Text Request
Related items