Font Size: a A A

Research On Application-transparent Strategies For Stacked Heterogeneous System

Posted on:2020-11-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:C LiFull Text:PDF
GTID:1368330611993110Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
In order to delay the ending of Moore's law,more emerging techniques appear to improve the chip performance in new ways,rather than traditional semiconductor methods.In these new directions,stacking technology and heterogeneous many-core acceleration technology have been gaining more tractions and they are widely used in commercial products.Stacked heterogeneous systems become important hardware resources in data centers.With the advancement and widespread application of cloud computing,stacked heterogeneous systems start providing powerful abilities of computing acceleration for cloud computing.However,the hardware and software resources of cloud computing are provided to a lot of users as services.Due to the complete isolation among multi-tenancy,it is difficult for programmers to perform application optimization in cloud environments from software perspective.Therefore,it is important to apply application-transparent performance optimization strategies based on hardware features.This thesis addresses three issues,including the performance degradation caused by memory oversubscription in a heterogeneous system,the high overhead of contextswitching in multi-tasking preemption,and the load imbalance of the stacked interconnect network.Several application-transparent strategies are proposed to solve these issues and efficiently optimize the performance without modifying the applications' code.The main contributions and innovations of this paper are shown as follows:(1)a framework for memory oversubscription managementNowadays,cloud providers usually virtualizes costly hardware resources for users to increase the resource utilization.However,it may cause the memory shortage for some memory-hungry users(such as machine learning training).Memory oversubscription has been enabled in modern GPUs.However,our measurements on a real GPUs shows that memory oversubscription can casue severe performance degradation.An applicationtransparent mechanism is urgently needed.In this thesis,we investigate the memory access behaviors of different applications,and we propose proactive eviction,memoryaware throttling and capacity compression to address different overheads.Our framework for memory oversubscription management is proposed to select the most effective combination of these techniques to recover the performance loss caused by memory oversubscription according to the application type.Experimental results show that our framework is effective at improving the performance under the memory oversubscription.(2)a dynamic and proactive preemption preemption mechnism using checkpointingMulti-tasking has been widely used in stacked heterogeneous systems.The preemption support is a necessary technique for multitasking.The preemption can satisfy the quality of service requirements of different applications,and it provides more options for multi-tasking switching as well.However,due to its SIMT model,the GPU has much larger context size compared with CPU,and the overhead of context switching is also increasing.In order to address this issue,we observe the launching process of GPU kernels,and dynamically adopts the proactive preemption mechanism using checkpointing to reduce the overhead of context switching.Experimental results show that our proactive preemption mechanism can reduce the latency from 8.9?s to 3.6?s,making it easier to meet quality of service requirements.(3)a dynamic latency-aware load-balancing strategy in 2.5D NoC architectureThe 2.5D stacking on-chip network is an emerging structure for silicon interposer architecture.It utilizes the abundant metal resources on the silicon interposer to create a new layer of network.As a result,it can hold those congested network traffic.However,in current design,the protocol-level traffic among cores is transfered on upper level network,while the memory traffic is transfered on the lower layer.Our measurements on the PARSEC benchmark show that there is a severe load imbalance between the upper and lower layers.We find that the accurate network congested information can be observed from the congested path.Therefore,this thesis proposes a latency-aware load balancing strategy in 2.5D NoC architecture.The congestion path is determined by the latency,and the path selection of packets is made.Experimental results show that our load balancing strategy has a 45% performance improvement over the baseline.In summary,this thesis focuses on the stacking heterogeneous system.It aims the need for “application transparency”.For issues of memory oversubscription,multi-tasking switching management and network load imbalance,the reason of these performance losses,application characteristics and hardware features are studied.Finally,we propose effective solutions to address these issues and the system performance are improved.Therefore,this thesis has both engineering value and theoretical significance.
Keywords/Search Tags:Stacking Technology, Heterogeneous System, Application-transparent, Memory Oversubscription, Virtual Memory, Multi-task Switch, Network-on-Chips, Load-balancing
PDF Full Text Request
Related items