Font Size: a A A

Data Centric Hardware/Software Co-optimization For Chip Multiprocessor

Posted on:2017-09-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H LiFull Text:PDF
GTID:1318330533455160Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Future extreme scale computing systems are facing two important challenges:energy-efficiency and data-centric application load.Chip-Multiprocessor(CMP),as one of the basis to construct future extreme scale computing systems,their design and applications also need to meet the requirement of “low-power and energy-efficiency”and “data-centric”.From various aspects of CMP designation,this paper adopted some state-of-theart methodology to improve the energy efficiency,parallelism and adaptability to different applications of the CMP,including data-centric system designation,applicationdriven designation and hardware/software co-optimization.In the core level,we adopted customizable designation and hardware/software co-optimization for target applications from for the aspects of computation and data.In the network-on-chip level,we implemented hardware supported message passing mechanism for the CMP.In the level of parallel programming schema,we proposed a schema of “approaching computation to data”.And finally,in the level of multithread scheduling,we proposed a data-centric multi-thread scheduling mechanism for the thread level speculation.The main innovation of this paper are:1.For stencil computation,this paper combined some general hardware/software optimization strategies with the application-oriented customizable core design.Beginning with a simple customizable core,we built a low-power energyefficient core/accelerator for stencil computation.By employing a series of hardware/software co-optimization methods,we improved the computational parallelism,data transmission efficiency and reduced data transmission efficiently.2.This paper designed a special message engine to optimize the message passing mechanism on Chip Multiprocessor(CMP)and implemented the engine in RTL level.From the three aspects of reducing useless data copy,improving the efficiency of large message transmission and reducing the overhead of complex communication command,we optimized the message passing mechanism on CMP effectively.3.This paper proposed an In Place computation model to overcome the execution bottleneck of some irregular applications on the Chip Multiprocessor.The key idea of the new computation model is to approach the computation to data.In the In Place model,we reduced data motion on chip to improve the execution efficiency of simple core,employed the “division” and “delegation” mechanism to avoid update competition for the irregular data access and constructed the “corelevel” pipeline to improve the parallelism and bandwidth usage.The In Place model improved the performance,scalability and energy efficiency for some typical irregular applications on the CMP.4.This paper implemented a ”compatibility” thread-level speculation(TLS)mechanism on the CMP to improve single-thread performance,which would use the stall computation resources to accelerate some sequential execution.To adapt the non-uniform distribution of the thread data on chip,we also proposed a data-centric scheduling mechanism for the speculative threads.The proposed mechanism improved the efficiency of TLS execution on CMP effectively.
Keywords/Search Tags:Chip Multiprocessor, Data Centic, Hardware/Software Cooptimization, Message passing, Irregular Applications, Thread Scheduling, Threadlevel Speculative
PDF Full Text Request
Related items