Font Size: a A A

Research On The Communication And Cache Techniques In BSP-Based Large Scale Graph Processing Systems

Posted on:2013-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q S BaiFull Text:PDF
GTID:2298330467974659Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of computer and network technology, parallel distributed computing in clusters has become the trend. The major advantage of Cloud Computing is its great power of parallel computing, and it’s based on a simple and effective parallel programming model. A most representative one is the MapReduce distributed parallel programming model proposed by Google. However, with the rapid development of Internet applications in recent years, analysis of big graphs like Web graphs and social networks has become the research focus, e.g. shortest paths in social networks, PageRank in web search, etc. These graph computing problems usually need many iterations, but MapReduce is for common big data computing problems, so that applying into these graph issues possibly result in sub-optimal performance. So some parallel models based on messages passing will suit these graph algorithms better. BSP (Bulk Synchronous Parallel) is a parallel computing model with supporting to messages passing, with an asynchronization parallelism inside bulks and an explicit synchronization among bulks. With the proposal of Pregel system, a system for large-scale graph processing based on BSP model, by Google, adopting BSP model for large-scale graph processing has become the major method.The purpose of this article is to study the techniques of the messages communication and the disk cache in BSP-based large-scale graph processing systems. We propose a way of organization of messages and communication method based on messages queues, and we present an optimized communication method based on messages packing, multi-producers pool and messages combining supported. Aiming at the problem that the memory of BSP-based large-scale graph processing systems may be not adequate to contain all the graph and messages data during computation, we build a memory data management model, and based on the thought of memory first, we present the disk cache strategies and algorithms of graph and messages data separately:MF-GHIC algorithm, MLF graph traversing algorithm and Messages queues’ priorities-based messages data disk cache algorithm.Applying the techniques of messages communication and disk cache presented in this article into NEU-BSP system, we take a series of experiments. Firstly, we analyze the optimal values of the parameters in the communication scheme and the relationship of the parameters. Secondly, we proved that the time performance declines indistinctively when less than30%of data cached onto disk. Finally, taking PageRank algorithm and single source shortest paths algorithm for examples, we compare the NEU-BSP with Hadoop, and the results turn out that the NEU-BSP will perform faster than Hadoop by from1.2to18times when data is all in memory, and the NEU-BSP will perform equally to Hadoop when there is more than30%data cached onto disk.
Keywords/Search Tags:BSP, graph processing, message communication, disk cache, cloud computing
PDF Full Text Request
Related items