Font Size: a A A

Research Of Link Prediction And Information Spreading On Complex Networks

Posted on:2023-11-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J RanFull Text:PDF
GTID:1520307046454004Subject:Intelligent computing and complex systems
Abstract/Summary:PDF Full Text Request
Complex networks are widely used to represent different kinds of complex systems,in which nodes are the units of a system and links are the associations between a pair of units.Many real-world systems such as biological systems,transportation systems,power systems,and social systems,can be studied using the theories and methods of complex networks.As an important branch of artificial intelligence,machine learning has been widely used to solve practical applications and scientific problems.More importantly,machine learning provides powerful technology for interdisciplinary research.Nowadays,a myriad of technological breakthroughs and cutting-edge sciences rely on the intersection and integration of multiple disciplines.Since interdisciplinary research can dialectically study the feasibility of methods from multiple perspectives,we can avoid the limitations from a single perspective.In the era of big data,discovering universal laws or hidden patterns from big data by effective methods can provide a theoretical basis for personalized recommendations,drug discovery,information spreading,and rumor control.At present,scholars mainly use traditional methods such as statistical physics to study link prediction and information spreading in the field of complex network.The core of this dissertation is to use statistical physics and machine learning to deeply study link prediction and information spreading,which aims to discover the universal law behind the link prediction task and infer the spreading mechanism of information spreading through interdisciplinary research methods.For link prediction,this dissertation focuses on the maximum capability of a topological feature in link prediction,which can be applied to optimize the feature and the method selection,and also advances our understanding of network characteristics associated with the utilization of a topological feature in link prediction.When quantifying the maximum capability of a topological feature,we find that almost all existing link prediction algorithms focus on short-path networks.Hence,we first use network topology and node properties to study static and temporal link prediction in long-path networks.For information spreading,this dissertation focuses on the macroscopic patterns of simple and complex contagion,which can guild us on how to interpret the empirical data,model and predict future spreading.When identifying the macroscopic patterns of simple and complex contagion,we find that the Linear Threshold(LT)model suffers several limitations in describing the time evolution of the spreading,and it is incompatible with existing models for simple contagion.Therefore,we first propose a Generalized Linear Threshold(GLT)model that can capture the continuoustime complex contagion process.The main research contents and major innovations of this dissertation are summarized as follows:First,this dissertation uses network topology and node properties to study static and temporal link prediction in long-path networks.In static link prediction,we propose a new index that is associated with the principles of Structural Equivalence and Shortest Path Length(SESPL)to estimate the likelihood of link existence in long-path networks.Through a test of 548 real networks,we find that SESPL is more effective and efficient than other unsupervised approaches in long-path networks.Meanwhile,we also exploit the performance of SESPL feature and embedding-based approaches via machine learning techniques,and the experimental results indicate that the best prediction performance comes from the SESPL feature.In temporal link prediction,to solve the temporal link prediction problems with new nodes,here we take into account the similarity of nodal attributes and the shortest path length,namely,ASSPL,to predict future links with new nodes.The results tested on scholar social networks and academic funding networks show that it is highly effective and applicable for ASSPL in funding networks with time-evolving.Meanwhile,we make full use of an efficient parameter to exploit how network structure or nodal attribute has an impact on the performance of temporal link prediction.Finally,we find that nodal attributes and network structure complement each other well for predicting future links with new nodes in funding networks.Next,this dissertation identifies patterns underlying the link prediction performance by selecting 18 indexes with topological features.We find that the maximum capability of a topological feature follows a simple and precise mathematical expression.The capability depends on the percentage of missing and nonexistent links that hold the feature,but is independent of the way that an index quantifies the feature.Hence,a family of indexes based on the same topological feature shares the same upper bound of prediction accuracy.The potential of all other indexes can be readily estimated through one single measurement.We also demonstrate that the supervised prediction in principle gives a more accurate result compared with the unsupervised prediction.But this improvement is not obtained merely by pushing the performance to the same upper bound.Instead,the maximum capability of the topological feature is lifted by utilizing the supervised method.The concise mathematical expression of the capability lift allows us to estimate the expected benefits of applying machine learning tools.Our work benefits from a recently announced large corpus of 550 structurally diverse networks,allowing us to empirically verify the universality of the pattern uncovered.The findings can be applied to optimize the feature and the method selection,and also advances our understanding of network characteristics associated with the utilization of a topological feature in link prediction.Further,this dissertation proposes a GLT model based on complex contagion for the continuous-time stochastic complex contagion process that can be efficiently implemented by the Gillespie algorithm.The evolutionary time in the GLT model has a physical meaning,which is associated with the rate of the underlying stochastic process.We find that compared with the GLT model,the traditional LT model tends to underestimate the spreading speed.The order of nodes being infected is properly defined in the GLT model,allowing us to better generate a synthetic spreading node sequence to model the spreading in real systems.Finally,the GLT model is compatible with the Susceptible-Infected(SI)model,because they are defined under the same mathematical framework.One can easily combine them to model a hybrid spreading process in which simple contagion accumulates the critical mass for the complex contagion that leads to the global cascade.Overall,the GLT model we proposed can be a useful tool to study complex contagion,especially when studying the time evolution of the spreading.Finally,this dissertation focuses on the continuous-time SI and LT model,which are the typical example of simple and complex contagion,respectively.We first analyze the evolution of the portion of the infected population at a time,capturing the spreading dynamics at the macroscopic level.We find that the curve from simple and complex contagion can overlap when the parameters are properly chosen.We then analyze the spreading dynamics at the microscopic level,characterized by the percentage or number of infected neighbors when a node becomes infected.As complex contagion requires more than one active neighbor to trigger the adoption,it is intuitively expected that the percentage or number of infected neighbors would be higher in LT model than those in SI model.However,this hypothesis is not supported by the simulation.When the property of the network changes,the percentage and number of infected neighbors generated by LT model can be higher or lower than that generated by SI model.Hence,different from our intuitions,we find that the two distinct models can not be differentiated by the spreading dynamics with confounding factors such as different initiator sizes,time evolution,and network topology.Instead,we propose that the temporal order in the spreading sequence can differentiate the two models,following the argument that complex contagion is apt to influence the local community whereas simple contagion can spread further and form a long spreading path.Using a simple machine learning tool,we confirm that the two spreading models can be accurately and robustly classified with different confounding factors whereas the baseline methods relying on the microscopic dynamic in general fail to accomplish the classification.Drawing on theoretical and large-scale empirical research,this dissertation reveals the universal law of a topological feature in link prediction,which can shed significant insights on optimizing network structure and designing efficient link prediction technology.Meanwhile,this dissertation also highlights the role of the temporal order in the spreading sequence in identifying the mechanism of the contagion process,which can provide theoretical guidance for the prediction and prevention of information spreading.
Keywords/Search Tags:Complex Networks, Statistical Physics, Machine Learning, Link Prediction, Information Spreading
PDF Full Text Request
Related items