Research On The Communication And Cache Techniques In BSP-Based Large Scale Graph Processing Systems

Posted on:2013-08-16

Degree:Master

Type:Thesis

Country:China

Candidate:Q S Bai

Full Text:PDF

GTID:2298330467974659

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of computer and network technology, parallel distributed computing in clusters has become the trend. The major advantage of Cloud Computing is its great power of parallel computing, and it’s based on a simple and effective parallel programming model. A most representative one is the MapReduce distributed parallel programming model proposed by Google. However, with the rapid development of Internet applications in recent years, analysis of big graphs like Web graphs and social networks has become the research focus, e.g. shortest paths in social networks, PageRank in web search, etc. These graph computing problems usually need many iterations, but MapReduce is for common big data computing problems, so that applying into these graph issues possibly result in sub-optimal performance. So some parallel models based on messages passing will suit these graph algorithms better. BSP (Bulk Synchronous Parallel) is a parallel computing model with supporting to messages passing, with an asynchronization parallelism inside bulks and an explicit synchronization among bulks. With the proposal of Pregel system, a system for large-scale graph processing based on BSP model, by Google, adopting BSP model for large-scale graph processing has become the major method.The purpose of this article is to study the techniques of the messages communication and the disk cache in BSP-based large-scale graph processing systems. We propose a way of organization of messages and communication method based on messages queues, and we present an optimized communication method based on messages packing, multi-producers pool and messages combining supported. Aiming at the problem that the memory of BSP-based large-scale graph processing systems may be not adequate to contain all the graph and messages data during computation, we build a memory data management model, and based on the thought of memory first, we present the disk cache strategies and algorithms of graph and messages data separately:MF-GHIC algorithm, MLF graph traversing algorithm and Messages queues’ priorities-based messages data disk cache algorithm.Applying the techniques of messages communication and disk cache presented in this article into NEU-BSP system, we take a series of experiments. Firstly, we analyze the optimal values of the parameters in the communication scheme and the relationship of the parameters. Secondly, we proved that the time performance declines indistinctively when less than30%of data cached onto disk. Finally, taking PageRank algorithm and single source shortest paths algorithm for examples, we compare the NEU-BSP with Hadoop, and the results turn out that the NEU-BSP will perform faster than Hadoop by from1.2to18times when data is all in memory, and the NEU-BSP will perform equally to Hadoop when there is more than30%data cached onto disk.

Keywords/Search Tags:

BSP, graph processing, message communication, disk cache, cloud computing

PDF Full Text Request

Related items

1	Research On Key Technologies Of Host-side Caching In Cloud Computing
2	Leveraging Compressed Cache To Improve Disk Performance
3	Key Technology Of Network Cache In Cloud Storage System
4	Research And Implementation Of Graph Data Processing Technology Based On Cloud Computing Environment
5	Research On Message Communication Model Under Comet Cloud Environment
6	Design And Implementation Of Android Mobile Cloud Disk
7	Research And Implementation Of Execution Optimization For Graph Computing With Application Resource Awareness In Cloud Environment
8	The Research Of Cache Optimization For Small Files And System Implementation
9	The Research Of Distributed Cache System Based On Cloud Computing
10	Research On Heterogeneous And Multi-core Graph Computing Systems