Font Size: a A A

Co-optimization Design Of Multicore Systems For High Performance Computing Nodes

Posted on:2016-04-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y YuFull Text:PDF
GTID:1318330482472526Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Thanks to the development of integrated circuit technology, there is an opportunity for the development of multi-core processor system-on-chip:all levels and components of the system (interface protocol, cache, interconnection networks, etc.) can explore the designs of cross-level optimization. This paper focuses on the computing node of the high-end systems, and explores the solutions from three aspects, task scheduling, storage subsystems, and on-chip interconnects.Firstly, a heterogeneous multicore task scheduler based on a master-slave real-time operating system is proposed in this paper. Operating system is the intermediate layer between applications and hardware, and its efficiency has a great influence on the performance of the entire system. By analyzing the parallel programming model and execution mode, we propose a master-slave scheduler, and its main features include:1) Simple and clear interface protocol, coordinating with data-flow parallel programming model, to facilitate the programmers to write parallel programs; 2) Task scheduler and its agent directors, they cooperate together to management the running programs; 3) A task classification mechanism is utilized to simplify the task scheduling process, as well as a dedicated control subnet for the transmission of synchronization messages, to reduce the overhead of task scheduling. The simulation results show that on a system configured with a master processor and 8 accelerator cores, using the proposed scheduler can reach 16% improvement of average speedup of parallel programs, and have good scalability as well.Secondly, this paper presents a thread-aware adaptive data prefetcher. The last-level cache and off-chip memory are usually shared among the threads running in a multi-core system. Data prefetch requests, which bring additional competitions and conflicts of the shared cache and memory bandwidth, will reduce the efficiency of applications, and limit the scalability of the system. By analyzing the shared resource competition, we propose an thread-aware adaptive data prefetcher, which mainly include:1) A thread classification mechanism, to dynamically adjust data prefetch engines during the run-time, and reduce the competition of shared resources; 2) A filtering mechanism, to avoid prefetching caused inter-thread invalidations; 3) A critical threads acceleration mechanism, to estimate and accelerate the critical threads that are on the longest paths of execution. On a set of parallel benchmarks, the proposed thread-aware adaptive data prefetching mechanism can reduce inter-thread shared data invalidation by 7%, and reduce average waiting time of thread synchronization by 6% over a multimode prefetcher baseline system. The proposed data prefetcher can save execution time by 18% over an no-prefetching system.Finally, this paper presents a simulation method for optical-electronical on-chip interconnects. How to take advantage of optical interconnects, and integrate with the traditional electrical interconnections, is an important research direction of network on chip. The optical devices and electrical devices are accurately modeled. A cycle synchronization mechanism is implemented to achieve cycle accurate simulation. A multithreading mechanism is implemented to extend the simulation scale to 256-node. Through the simulation experiments, this paper analyzes the power, latency, and other performance parameters in the cases of different cluster sizes, and proposes future optimization directions of on-chip interconnected networks.The optimization design method for multi-core processor proposed in this paper reduces the cost of multi-core task scheduling, moderates the shared data conflicts between threads, and improves the efficiency of the operating system, the storage subsystem and the on-chip interconnection networks. Our method is a valuable reference for the computing nodes of future high-end systems.
Keywords/Search Tags:Multicore systems, Interface protocol, Thread-aware, Data prefetching, Optical-electronical interconnects
PDF Full Text Request
Related items