Customizing A Chip Multiprocessor For The Breadth-first Search Algorithm

Posted on:2015-12-16

Degree:Master

Type:Thesis

Country:China

Candidate:J F Yang

Full Text:PDF

GTID:2308330452469523

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The high power consumption has seriously restricted the development of future highperformance computing systems. As a result, the processor design has been moreand more focused on the improvement of the performance-per-watt. The processorarchitecture customizing for some crucial applications has become one of the mostimportant technology-trends. On the other hand, the applications of graph dataprocessing have been used more and more widely in many areas. As the basis ofmany graph algorithms, the breadth first search (BFS) algorithm becomesincreasingly important.With the development of multi-/many-core processor-architectures, more and morecores will be integrated into a single chip. Thus, mapping the BFS algorithm to theChip Multiprocessor (CMP) faces the following challenges: long memoryaccess-latencies followed by the relatively simple computation, the poor data locality,load-imbalance and synchronization overheads of the parallel execution. This paperproposes a hardware/software co-optimized solution for the efficient and scalableBFS algorithm on a large scale CMP.1) In our algorithm, we divided the BFSalgorithm into two stages, the memory access part and the computing part; morethreads (or cores) has been used to run the first part that is relatively slow, which canhide access latencies and parallelize memory accesses and computation. At the sametime, we used different data-partitioning methods respectively for the memory accesspart and the computing part, to achieve the load-balance while avoidinglock-operations.2) We customized a processor core uses the local memory on-chipinstead of the last level cache to store the frequently accessed discrete data. Thismethod improves the performance and decrease the power-consumption.3) We useda fast on-chip network for the rapid data-transferring between cores.Our experiments indicate that, with the appropriate algorithm design andarchitecture support, graph data processing can be implemented efficiently on thepower efficient CMP. The simulation of a customized16-Xtensa-core CMP hasshown that our BFS algorithm can achieves90%of the performance of an IntelXeon X5560processor running the Graph500’omp-csr’ benchmark(the CMOStechnology is the same), while the chip area is10%and the TDP is1.6%of the later.As a result, the performance-per-watt increases41.98times.

Keywords/Search Tags:

breadth‐first search, multiprocessor, customized processor

PDF Full Text Request

Related items

1	Design And Implementation Of Customized Control Processor And On-Chip Bus Interface In MPSOC
2	Research On The Design And Implementation Techniques Of Embedded Heterogeneous Multiprocessor
3	Design And Implementation Of User-customized Desktop Search Engine
4	Research On Customized Web Crawling
5	Development Of Dual-core Processor System Based On Nios Ⅱ
6	Research On Key Technologies Of Customed Processor Of Cipher Application
7	Oriented Research And Realization Of Embedded Linux System For Multi-processor Architecture
8	Ip Forwarding System Of Multi-core Network Processor-based Design And Realization
9	Design And Realization Of Personnel Scheduling System In Customized Equipment Assembly Workshop
10	The Design And Realization Of Multi-core Soc Platform For Media Applications