Font Size: a A A

High Performance And Energy Effcient Task Scheduling In Complex Parallel Architectures

Posted on:2015-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ChenFull Text:PDF
GTID:1228330452966616Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Due to the physical limitations in industrial manufacture, the high energy con-sumption and the heat dissipation problem, the increasing speed of the core frequencycannot catch up with the increasing ratio of the desire on computation ability in highperformance computing feld. Therefore, varies of parallel computing architectures,which are based on multi-core processor, are developing rapidly. How to efcient uti-lize the abundant computation resources in parallel architectures has become one ofresearch hotspots in high performance computing feld. Especially, without modifyingthe programs, how to improve the performance of parallel programs, saving the energyconsumption for executing parallel programs, improve and balance the performance ofprograms that are executed concurrently through optimal task scheduling is the mostimportant.Targeting varies mainstream parallel architectures, we investigate high perfor-mance and energy efcient task scheduling policies thoroughly. Based on the inves-tigated policies, for complex parallel platforms, we propose and implement a runtimeHighPerformanceEnergy Efcienttask schedulingsystem (HPEEsystem). HPEE sys-temconsistsoffvemainmodules: cache-awarebi-tierwork-stealingmodule, locality-aware work-stealing module, bandwidth-conscious core allocation module, workload-aware task scheduling module and energy-efcient workload-aware task schedulingmodule. Multi-socket multi-core architecture, Multi-socket Multi-core architecturewith NUMA memory system, asymmetric multi-core architecture, and multi-core ar-chitecture with DVFS support are taken into consideration in these modules.In multi-socket multi-core architecture, the cores in the same socket share the lastlevelcachebutthecoresindiferentsocketsonlysharetheDRAM.Therefore,targeting multi-socket multi-core architecture, we main optimize the shared cache performance.If a multi-socket multi-core architecture only executes one parallel program, HPEEsystemusesthecache-awarebi-tierwork-stealingmoduletoscheduletaskswithshareddata into the same socket. Based on this method, the shared data of the tasks onlyneeded to be read into the shared cache once but all the cores in the socket can directlyaccessthedatainhighspeedfromthesharedcachedirectly. Experimentalresultsshowthat,cache-awarebi-tierwork-stealingmodulecanreducetheexecutiontimeofparallelprograms up to74.4%compared with traditional work-stealing.If the multi-socket multi-core architecture uses NUMA memory system, HPEEsystem uses the locality-aware work-stealing module to evenly distribute the data set ofaparallelprogramtodiferentmemorynodesandallocatethetaskstothesocketswherethe local memory nodes store their data. Using this method, each task can access itsdata from either the shared cache or the local memory node. The experimental resultsshow that locality-aware work-stealing module can reduce up to54.2%execution timeof parallel programs compared to traditional work-stealing schedulers.However, if multiple parallel programs run on a multi-socket multi-core archi-tecture concurrently, these programs will contend for both the computation resource(cores) and storage resource (cache and cache bandwidth). How to appropriately al-locate the computation resource and storage resource to the co-running programs sothat they can achieve high and balanced performance is a key issue. Targeting thisissue, based on the real time desire of the programs, HPEE system uses the bandwidth-conscious core allocation module to periodically reallocate the computation resourceand storage resource to the co-running programs. Under the constraint of fulflling thedesire of the programs, if most of the shared cache bandwidth of a socket has beenoccupied, then bandwidth-conscious core allocation module allocates the free coresin the socket to compute-intensive programs and vice versa. In this way, bandwidth-conscious core allocation module minimizes the shared cache contention and hencecan improve the performance of the co-running programs. Experimental results showthat the module can reduce the response time of co-running programs up to54.7%compared to traditional space-sharing scheme.Inasymmetricmulti-corearchitecture,diferentcoresoperateatdiferentfrequen- ciesandtheirfrequenciesareconstant. Howtobalancetheworkloadsamongcoreswithdiferent frequencies is a key issue to optimize the performance of parallel programs inasymmetric multi-core architecture. Targetingthis issue, based on the workloads infor-mation of tasks that are collected at realtime, HPEE system uses the workload-awaretask scheduling module to schedule tasks. Based on the task types and the workloadsof tasks of the same types, this module uses a history-based task allocation scheme toallocate the to-be-executed tasks with heavy workloads to cores operating at high fre-quencies. At the same time, because historical information is not totally precise, thismodule further uses a dynamic preference-based work-stealing policy to balance theworkloads at runtime. Experimental results show that workload-aware task schedulingmodule can reduce the execution time of parallel programs up to82.7%compared tothe random work stealing approach commonly employed.In multi-core architecture with DVFS support, HPEE system uses the energy-efcientworkload-waretaskschedulingmoduletoscheduletasksinahighperformanceand energy efcient manner. Based on the task types and the workloads of tasks ofthe same types, this module uses a workload-aware frequency adjuster to search theoptimal frequency confguration for executing the current parallel program. At thesame time, because historical information is not totally precise, this module also uses apreference-based task scheduler to balance the workloads among diferent cores. Theexperimental results show that energy-efcient workload-ware task scheduling modulecan reduce energy consumption up to29.8%with a slight impact on the performance(performance degradation is less than3.7%).
Keywords/Search Tags:Complex parallel architectures, Task scheduling, Runtime profling, History-based scheduling, Energy Efciency
PDF Full Text Request
Related items