Font Size: a A A

Machine Learning Based Energy Efficiency Modeling Of Computing Nodes In High-Performance Computing Systems

Posted on:2022-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:X X QiFull Text:PDF
GTID:2558307169479264Subject:Computer Science and Technology
Abstract/Summary:
Energy efficiency is one of the key issues facing the development of exascale high-performance computing(HPC)systems.On the premise of ensuring that the system performance is steadily boosting,reducing consumption has become an important goal for the development of HPC systems in various countries.Energy efficiency modeling is the basis and premise of energy efficiency optimization,that is,accurately capturing real-time performance,power consumption,and energy consumption of various levels of HPC systems,such as computing components,nodes,plug-ins,and cabinets with the assistance of hardware or software tools.Energy efficiency modeling can be further divided into power modeling and performance modeling.Among them,energy efficiency modeling for computing nodes is particularly important.For one,it provides users with information about the performance,power consumption,and energy consumption,and guides users to further perform performance tuning.For another,starting from the computing node,energy efficiency modeling not only can be easily refined down to computing components,such as CPU,DRAM,and etc.,but also can be extended to multiple nodes and cabinets,and ultimately achieve full coverage of the whole system.Due to its real-time and non-intrusive characteristics,the modeling methods based on hardware performance counter(PMC)have become the mainstream of current energy efficiency modeling.Based on the machine learning models,this paper focuses on the limitations of existing PMC-based modeling methods,and conducts research from the following aspects:Firstly,although the PMC information provided by the processors is widely used in previous power modeling methods,the temporal and spatial dependence between PMCs has not been fully explored.This paper introduces graph convolutional neural(GCN)network to extract the spatial dependence between PMCs,the gated recurrent unit(GRU)to mine the time dependence,and combines these two models to propose a spatio-temporal dependence empirical modeling method.Secondly,the high-precision model requires the training set not only to contain enough samples but also to cover a rich variety of program types,which makes designing a comprehensive and balanced training set a very challenging task.Since the extremely random tree has a good balance of prediction error on the data set with unbalanced samples,this paper uses it as the basis,adopts an online and offline data collection method,design and implement an instantaneous power monitoring framework.Finally,due to the complexity of applications and the huge gaps between platforms,it is difficult to build accurate and low-overhead energy efficiency models,especially in cross-platform scenarios with different processors or instruction set architectures.Aiming at the high cost of data collection in the existing cross-platform modeling,this paper proposes a hierarchical transfer learning method,which can achieve data reduction from the two dimensions of samples and features,ensuring the high accuracy of the model while reducing the data collection cost.
Keywords/Search Tags:High Performance Computing, Energy Efficiency Model, Power Modeling, Performance Modeling, Machine Learning
Related items