Font Size: a A A

Research On Key Technologies Of Large-scale Graph Processing

Posted on:2018-02-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H ZhouFull Text:PDF
GTID:1318330518997787Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
During the past ten years, with the booming development of information and com-munication technology, human society has stepped into the era of big data. Every time,massive amounts of information are being generated, and accumulated into data gold-mines. In fact, a large portion of these massive amounts of information can be abstracted as graph-structured data quite naturally, for example, social graphs, web link graphs,consumer-products graphs, etc. Accordingly, many applications from real life can be converted into graph tasks very naturally. In recent years, as the graph-structured data expands continuously, efficient analytics and processing for large-scale graphs can bring more and more significant academic, economic and social values. Thereby, large-scale graph problems have attracted more and more attention from academia and industry.Large-scale graph problems involve many subjects, such as graph algorithm, graph storage and graph computation. As an architecture researcher, we focus on graph com-putation and graph storage. From the perspective of system architects, we are facing two main challenges for an energy-efficient large-scale graph processing system: how to processing graph data efficiently, and how to store and access graph data efficiently.Facing the first challenge, we propose StreamGraphChi and Mermai which are two high performance disk-based graph processing systems. Due to Moore Laws and Dennard Scaling gradually lose effectiveness, heterogeneous computing is favored. We propose TuNao, which aims to improve the energy-efficiency for large-scale graph processing by dedicated hardware accelerator. In response to the second challenge, we mainly pay attention to hash lookup table which is widely applied to databases, and we propose FAHT, which aims to accelerate the query performance for graph database. Specifi-cally, we mainly make the following contributions:· StreamGraphChi. In this work, we design a novel programming framework and a novel engine for graph processing, which obey the edge-centric graph model,support sequential disk accesses and avoid generating a large amount of extra temporary data. In addition, depending on the available memory capacity of the platform and the scale of the input graph, we implement two engines, including IM-StreamGraphChi and OM-StreamGraphChi. According to the skewed power-law, the graph processing system can choose an appropriate engine for a given host and a certain input graph adaptively. StreamGraphChi aims to improve the disk bandwidth utilization and reduce the disk I/O ammounts, so that it is able to achieve higher performance.Mermaid. The vertex-centric and the edge-centric are two common graph ex-ecution modes. In this work, we analysis these two modes and argue that the vertex-centric graph mode is more applicable to high-degree vertices and the edge-centric graph mode is more applicable to low-degree vertices. The vertex degrees in large-scale real-world graphs often obey the skewed power-law distri-butions, and thus just using only one graph execution mode isn't a good choice for those real-world graphs. Consequently, basing on the IM-StreamGraphChi engine, we redesign the graph representation method, the programming frame-work and the processing engine, so as to integrate these two graph modes together skillfully and then improve the system performance.· TuNao. Currently, using dedicated hardware to improve the energy-efficiency of special applications is a promising way. Fortunately, large-scale real-world graph tasks often obey the similar computing framwork, which makes it possi-ble to design a graph computation hardware accelerator. In this work, using the current memory storage technology, we mainly focus on memory access, compu-tation and applicability about the design of this accelerator, and manage to utilize features of real-world graph data effectively. For memory access, we manage to reduce random accesses, improve data locality and reduce off-chip memory accesses. For computation, we utilize hardware pipelines and manage to exploit potential parallelism efficiently. For applicability, we use reconfigurable hard-ware units to apply to different graph algorithms.FAHT. Hash table is a common data structure, and is widely used. In traditional hash table, query overheads related to keywords mainly include storage overhead,memory access overhead and computation overhead. The main purpose of hav-ing keywords is to ensure the absolute correctness of returned results. With the scale of hash table enlarging continuously, these overheads are more and more significant, especially when the keyword has big size. Some works point out that hash table without keywords will have much better performance. Of course, it cann't ensure correctness for each query result. Fortunately, many real-life appli-cations are able to tolerate a low error rate. Therefore, we redesign the dynamic insertion, the dynamic deletion and the dynamic query algorithms, and adopt a novel two-level storage structure. What's more, we make a theoretical analysis for the storage usage and the query error rate, and corresponding calculation for-mulas are given. Theoretical analysis and experimental data show that FAHT is superior to the existing methods.
Keywords/Search Tags:graph computation, large-scale graph-structured data processing, largescale real-world graphs, disk random accesses, memory random accesses, dedicated hardware accelerator, reconfigurable computing, data locality, hash lookup table
PDF Full Text Request
Related items