Font Size: a A A

Scalable Analysis of Information Flows in Networks

Posted on:2015-12-17Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Subbian, KarthikFull Text:PDF
GTID:2478390017993729Subject:Computer Science
Abstract/Summary:
Social, collaboration and information networks are rich in interactions through the exchange of different types of content, such as video, audio, text, short and long hyperlinks. These networks are content-rich due to the heterogeneity and size of the content generated by the nodes over long periods. Some of these networks may not necessarily have content, but they may have meta-data, such as synthetic neural networks and climate networks. Despite their differences, all these networks are extremely complex to comprehend and modeling them is even more difficult when they are dynamic and continuously evolving in time. Moreover, the underlying interaction phenomena in these networks, such as information diffusion, can be best understood only at scale and traditional algorithms are either sequential or do not take advantage of content and its temporal dynamics.;In social and collaboration networks, such as Twitter and DBLP, content propagates from one node to another through the influence of the author, nature of the content and the time of posting. Sometimes a message reposted may not be the same, however, sufficiently influenced by the original message. The goal of this task is to understand the causal behavior of topical influence in networks by developing an information flow mining tool. This tool can be used in variety of applications from understanding the most influential authors in social media to finding outlier communication sequences in cyber-crime. Traditionally, mining of content sequences is approached as a sequence mining problem, with no underlying network structure. One can later associate the content sequences with the network structure as a post-processing step. However, the treatment of content and network structure independently is extremely inefficient as the number content sequences extracted in the first step without the knowledge of network structure can be exponentially large and may never finish processing. We propose an integrated approach "InFlowMine", for mining information flow patterns by tightly integrating content and network structure during the mining process. The network structure is used to guide the candidate generation process of content sequence extraction. Our approach to mine these patterns is an order of magnitude faster than state-of-the-art sequence mining techniques. We evaluated the information flow patterns discovered in the context of influence analysis application and found the patterns to be extremely useful. We show that using patterns to mine influencers is equivalent to maximizing a sub-modular function and comes with a ;When we deal with extremely large graphs, the InFlowMine approach is still inefficient as it is sequential in nature and runs in a single computer and does not scale. Usually parallelization for such problems is dealt at a sub-graph level or at a flow path level, where each sub-graph or flow path is treated as an independent computational unit. Instead, in our approach we provide the highest level of parallelism by treating each node in the network as an independent computational unit. Our approach integrates network, content and time by exchanging a compact summary of content propagated by each node with time information to its neighbors. Each vertex updates its internal flow representations and creates a new set of summary objects to exchange with its neighbors for the next iteration. In each iteration we discover flow paths of one step longer than the previous iteration and this process terminates when there are no further paths to expand. We use vertex centric computational model for mining the information flow patterns in the context of Gather-Apply-Scatter (GAS) framework. Our approach "pFlower" scales linear with the number of cores and provides three orders of magnitude improvement over the baseline.;Today most of the big social networks, such as Twitter or Facebook feeds, are available in a streaming fashion, where each message that propagated along a set of edges in the network arrives in a data stream with time stamp information. The flow volume and velocity of such data streams are huge, especially in tens of thousands of objects per second. Most of this data has to be processed as they arrive, in order to provide near real-time information on flow patterns. In such scenarios, one needs to maintain the approximate information flow patterns that are more recent and highly frequent for different topics of interest. Current techniques for topic modeling completely ignore the availability of network structure, while the network analysis ignores the content. Our approach integrates both these ends by developing online topic models maintained on evolving approximate flow cascades. Our approach for recommendation in social networks improves the precision and recall measures up to 18% compared to baselines.;In summary, this thesis presents a set of information flow mining algorithms and applications, for understanding large scale social and collaboration networks, using content, network and temporal information in an unified and scalable fashion.
Keywords/Search Tags:Information, Network, Content, Flow, Social, Collaboration, Mining, Approach
Related items