Font Size: a A A

Information Diffusion and Distance Query in Large-Scale Networks: Theory, Algorithms and Application

Posted on:2018-02-19Degree:Ph.DType:Thesis
University:The Chinese University of Hong Kong (Hong Kong)Candidate:Lin, YishiFull Text:PDF
GTID:2448390005951639Subject:Computer Science
Abstract/Summary:
With the popularity of the internet, online social networks have become an indispensable part of our daily life. Theoretical analysis of online social networks is fundamental to understanding and modeling human behaviors such as epidemic spreading and information diffusion. Meanwhile, efficient algorithms for graph analytics or social network applications (e.g., viral marketing campaigns) are essential in analyzing social networks and utilizing them to better serve us. In this thesis, we focus on the diffusion process in social networks and its application, as well as developing efficient algorithms for graph analytics.;First, we study the epidemic spreading process by proposing and analyzing a generalized Susceptible-Infected-Susceptible (SIS) model. We use the mean-field analysis technique to determine the condition which can prevent the outbreak of diseases. We also extend our generalized SIS model to analyze the dynamics of two competing sources. We present the analytical derivation and show via experiment how different factors such as initial condition, transmission rates, recovery rates or the number of states affect the phase transition process and the final equilibrium. Our models and methodology serve as essential tools in analyzing the information diffusion process in large networks.;Then, we study two influence maximization problems. Influence maximization problems are mathematical characterizations of viral marketing campaigns, focusing primarily on how to select "seed users" (or, "initial adopters") so to maximize the spread of influence via the "word-of-mouth effect". Due to the large population of online social networks nowadays, the research on influence maximization problems has become a hotspot in social network analysis. We first study a novel competitive influence maximization problem with partial information: Given the influence diffusion model and the probability of every user being a "seed" (or a "initial adopter") of our competitor, how to find a set of k seeds so to trigger the largest expected influence cascade under the presence of competitors? Next, we study a novel influence boosting problem: Motivated by the observation that direct incentives could "boost" users so that they are more likely to be influenced by friends, we study the k-boosting problem which aims to find k users to boost so that the final "boosted" influence spread is maximized. For the two problems, we design approximation algorithms. We provide formal analysis about the performance guarantee and the expected running time. We also conduct extensive experiments on real-world datasets to demonstrate the efficiency and effectiveness of proposed algorithms.;Lastly, we study the problem of processing distance queries on disk-resident scale-free dynamic graphs, motivated by the need for low-latency distance queries in large-scale "dynamic" graphs. Our query processing utilizes the classical canonical labeling method, and we propose two novel I/O efficient algorithms to update the canonical labeling. We also show how to answer distance queries on the latest network based on outdated labels and new edges. Extensive experiments demonstrate the efficiency and the superiority of our algorithms.
Keywords/Search Tags:Algorithms, Networks, Information diffusion, Distance, Influence maximization problems
Related items