Research On BFS Algorithm Under Multiple Architectures

Posted on:2019-04-14

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Peng

Full Text:PDF

GTID:2428330611993633

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Breadth-First Search(BFS)is the underlying kernel algorithm for many graph applications such as social networks,medical informatics,transport systems,etc.Therefore,it has been absorbed as a core of Graph500,used to evaluate the capability of supercomputers in terms of big data processing.This paper is based on the Guangzhou Supercomputer Center in the Milky Way ?provides a platform for distributed memory architecture and shared memory architecture of BFS algorithm.First of all,we quantitatively analyze the performance bottleneck of BFS algorithm in distributed memory architecture.And put forward the optimization method of load balancing on the distributed memory architecture,the "top-down" phase synchronization is key information accessed,in "bottom-up" phase synchronization key queue information to achieve balance in each process communication times.At the same time,this paper to reduce the length of message and communication through the local marker method to reduce the number of invalid communication by way of asynchronous communication library to re package small message headers,reduce the communication of distributed memory architecture.Secondly,according to the Knights Landing(KNL)chip to optimize BFS algorithms for shared memory architecture.For the heterogeneous memory hierarchy of KNL,this paper uses KNL chip to provide the Flat model,the performance difference between MCDRAM and DRAM,the vertex data,the parent node data distribution in MCDRAM,the edge data,temporary variables and array allocation in DRAM on the way to achieve read and write data and faster memory access.At the same time,to update the cache strategy,the main work of this paper is to use Clik algorithm for fine-grained redivides,to ensure that each work is suitable for a 8 thread reads a CPU vertex cache line value,and will not update the values of each of the other effects of cache line.Finally,we evaluated the experiment in Tianhe-2 supercomputer and KNL chip.The experimental results show that the distributed memory architecture,our method can effectively shorten the communication time,compared with the top-down method,obtained about 3 times speedup.In the KNL platform,our method is effective in reducing the last level cache of load and store the number of instructions,which was about 10 times speedup in ramt.

Keywords/Search Tags:

Graph500, Breadth-First Search, KNL, Communication optimization, Cache update strategy

PDF Full Text Request

Related items

1	Research On K-medoids Clustering Algorithm Based On Breadth-first Search
2	Load-aware Optimization On Cache Management For SSD
3	Based On The Design And Implementation Of The Theme Of The Breadth-first Crawler
4	Parallel Breadth-first Search Algorithms
5	Research On Cache Strategy Of Mobile Edge Network
6	Research On Optimization Of CDN Caching Strategy Based On Machine Learning
7	Research On Collaborative Edge Cache Strategy
8	The Research And Design Of Data Cached Strategy Based On Eventual Consistency
9	The Study On Inference Based Complete DCOP Algorithms And Experiment Platform Implementation
10	Research On Dynamic Cache Strategy For Flight Search Engine