Font Size: a A A

Research On Some Key Issues Of Statistical Network Models

Posted on:2015-12-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H ZhaoFull Text:PDF
GTID:1228330467956784Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Various relationships in real world are described by complex networks, and the study ofcomplex networks has the important theoretical significance and application value. As one ofthe major topics in the areas of complex network and machine learning, statistical networkmodels play a huge role in understanding the structural characteristics, generative mechanismand evolution of complex network, and exploring the nature behind the networks. In recentyears, stochastic blockmodel and random walk model have attracted wide attention ofscholars and have become the research focus in the areas of complex networks, a number ofresearch results have indicated that two models have the excellent performance in structuremining, link prediction, node ranking, and so on. However, due to the complexity of complexnetworks/complex systems, there are many statistical modeling and inferential challenges inthe analysis of network data.This thesis focuses on statistical network models, especially, stochastic blockmodel andrandom walk model, some key issues related to two models, such as network modeling,statistical inference, model application, are researched deeply. For the drawbacks of theexisting SBMs and their learning algorithms, some innovative ideas are proposed; For randomwalk model, some novel solution about the fusion between community mining and noderanking are proposed. The main contributions in the thesis are as follows:1) To address the existing stochastic blockmodel failing to handle signed networks, a novelsigned stochastic blockmodel and its learning algorithm are proposed. To the best of ourknowledge, it is the first method to find the communities in the signed networks by usingstatistical model and variational inference. Currently, the existing SBMs can efficiently handlethe direction and weight of links, but they fail to handle the sign of links. For solving thequestion, first, based on the SBM, a novel signed stochastic blockmodel is proposed to modelthe signed networks for community detection. For the proposed model, the positive links andnegative links are modeled from the perspective of network noise, so that the problem ofsigned networks modeling is efficiently solved and communities can be efficiently found inthe signed networks. In addition, the proposed model is a generative model of signednetworks, which can produce the synthetic signed networks with different characteristics ofcommunity. To efficiently learn the model, a learning algorithm with the ability of modelselection is proposed. The proposed algorithm adopts the variational Bayes EM to estimatethe proximate distribution of variable and parameters, and based on these learned proximatedistributions, a new model selection criterion is derived to evaluate the models. Comparedwith the existing algorithms, the proposed algorithm can not only accurately findcommunities in signed networks without any priori structure knowledge, but also be applied to the areas of sign prediction and analyzing the structural balance.2) To address the high time complexity of the existing stochastic blockmodel learningalgorithm with model selection, a novel block-wise stochastic blockmodel learning algorithmis proposed. To the best of our knowledge, it is the first algorithm to achieve the parallellearning of parameter estimation and model selection by adopting block-wise learningmechanism. Since the current learning algorithms adopt the model-wise serial learningmechanism, a large amount of computing cost is taken to re-compute the learned informationso that these algorithms have the extremely high time complexity and only handle the smallnetworks with hundreds of nodes. To address the problem, first, a fine-gained stochasticblockmodel is proposed, which can capture more detailed structure than standard stochasticblockmodel and improve the generalization of model by replacing block connection matrixwith block-node connection matrix. It is more important for the proposed model to provide anefficiently learning way to reduce the time complexity. Then, based on the fine-gainedstochastic blockmodel, a block-wise stochastic blockmodel learning algorithm is presented.The proposed learning algorithm integrates the minimum message length criterion into EMalgorithm to achieve the block-wise parallel learning mechanism, in which model selectionand parameter estimation are concurrently computed so that the time complexity of learningprocess is significantly reduced. Consequently, the proposed algorithm can achieve the besttradeoff between effectiveness and efficiency through greatly reducing learning time whilepreserving competitive learning accuracy. In contrast to the existing learning algorithms withmodel selection just dealing with networks with hundreds of nodes, the proposed algorithmcan scale to the large networks with ten thousands of nodes.3) Based on the characteristics of community, in the framework of random walk model, anovel node ranking algorithm is proposed. It is the node ranking algorithm that considers theglobal structure of networks described by community and local structure of networksdescribed by proximity alignment. Given a network and one target node or a group of targetnodes, the ranking problem is to find out a sequence of nodes that are the most relevant to thetarget nodes in terms of network structure. In essence, most of recommender systems and linkprediction algorithms are the problem of node ranking. The proposed algorithm first dividesthe network into communities in terms of its global structure and then compute the proximityalignment of nodes in terms of the local structure consisting of the communities containingthe target nodes. Finally, the proximity alignment contains the community-based globalinformation and subgraph-based local information. It is the key difference between theproposed algorithm and the existing algorithm. To detect communities, a novel communitymining algorithm is proposed to find out communities, overlapped communities and theirhierarchical structure in the networks. To calculate proximity alignment, a novel nodeproximity algorithm based on random walk model is presented, which not only can calculatethe node proximity on one target node, but also can calculate node proximity on a group oftarget nodes. To address the large networks, an offline/online schema is proposed to implement the proposed node ranking algorithm, which decomposes the task of node rankinginto two phases, i.e. offline phase and online phase. In the offline phase, the expensivecomputations, such as the community detection and indexing nodes of communities, areimplemented. In the online phase, the proximity alignment is calculated in terms of thesubgraph containing the target node. Finally, node ranking can be quickly and efficientlycalculated.These studies focus on statistical network models, especially, stochastic blockmodel andrandom walk. Some key issues of statistical network models, such as network modeling,statistical inference and their applications, have been explored. The results of studies not onlypromote the further development of statistical models, but also provide the efficientapproaches to analyze the structure of networks.
Keywords/Search Tags:Complex network, Statistical learning, Stochastic blockmodel, Random walk model, Structure mining, Bayesian model, Variational inference, Model selection
PDF Full Text Request
Related items