Font Size: a A A

Storage And Analysis Systems For Large-Scale Time-Evolving Graphs

Posted on:2016-07-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y S MiaoFull Text:PDF
GTID:1228330467490508Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
A graph consists of vertices and edges, capturing relationships between entities. Graph mining extracts valuable information from graphs, and is widely used in infor-mation retrieval and social analysis. In order to apply graph mining on large-scale real graph efficiently, many storage and analysis systems have been proposed to provide friendly algorithm interface and high-performance storage and execution. So the algo-rithm designers only need to write several lines of code to analyze large scale graph, without worrying about the system problems such as data access, execution scheduling, parallelization and fault-tolerance.Time-evolving graphs capture the dynamic information of a graph, which attracts increasing interests from data-mining communities. For example, up-to-date page rank-ings can help better serve search engines; users’long-term authority history can help better know these people; social graph diameter over time can tell us the evolving char-acteristics of a social network. However, existing graph systems are designed for static graph mining and cannot support mining on such time-evolving graphs.In this dissertation, we have analyzed the evolving graph mining problems and proposed a system to support managing and mining large-scale time-evolving graphs. It takes a stream of graph updates as input, keeping a consistent update-to-date graph view for near-real-time graph mining. On the other hand, it keeps the graph changes and merges them into efficient format for historical graph mining. To achieve this goal, we need to solve following two key problems that come from the additional time di-mension’s requirements comparing with existing static graph mining systems:First, we need to perform graph mining near real-time. We propose an online sub-system to solve this problem. It takes the incoming graph updates and applies them to a distributed graph for analysis. To achieve high-throughput and low-latency graph update with distributed consistency, It separates the graph update and graph compu-tation to avoid interference, and adopts an epoch-commit graph updating protocol for storage management. It also contains an incremental graph mining model which can sig-nificantly reduce the latency of graph computation. We have implemented the online subsystem with three applications, including user ranking, controversial topic detection and approximate shortest path. In the experiment, we evaluated its performance on real twitter data. The results show that, on a cluster with40servers, we can achieve100K tweets per second throughput, with data timeliness less than2.5minutes.Second, we need to perform graph minings on a long time range. We propose an offline subsystem to solve this problem. It maintains a long history of the time-evolving graph, and supports analysis on the data. Comparing with static graph analysis which involves only one graph snapshot, the analysis may involve a series of graph snapshots for long-term analysis, which increases the computation cost. On the other hand, the extra time dimension makes the graph data access patterns diverse, and hard to keep time-evolving queries efficient. Locality is the design center of this subsystem, where temporal graphs are carefully laid out both on persistent storage and in memory, taking into account data locality in both time and graph-structure dimensions. The offline sub-system also introduces locality-aware batch scheduling. By reducing the cache-miss and synchronization cost during computation, we significantly improved the perfor-mance of graph minings on a very long time range. In the experiment, we evaluated the offline subsystem on different large-scale real-world graph with different queries and applications. Comparing with existing static graph engines, it can achieve an order of magnitude improvement. And as a time-evolving graph store, it shows up to5times better than existing database solutions.
Keywords/Search Tags:temporal graph, time-evolving graph, graph processing, graph storage, distributed system
PDF Full Text Request
Related items