Font Size: a A A

Research On Performance Prediction And Energy Efficiency Optimization Of HPC Programs

Posted on:2021-09-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:M HaoFull Text:PDF
GTID:1488306569484254Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the development of high-performance computing,HPC systems' scale and complexity increase significantly.The performance of supercomputers is continually being upgraded from petascale to exascale.It brings significant challenges to the porting and optimization of parallel programs.When parallel programs are ported to large-scale HPC systems,they usually face the problems of low execution efficiency and poor scalability,making it challenging to make full use of hardware systems' computing resources.It will lead to the waste of computing resources and power resources of the HPC system and increase the operational overhead.Besides,due to power and power supply system limitations,energy and power consumption have become vital design constraints for large-scale HPC systems,especially for future power-constrained exascale supercomputers.Therefore,it is necessary to build a performance prediction model for parallel programs,explore the program's performance and scalability bottlenecks,and propose corresponding cooptimization methods according to system and application characteristics under power constraints to improve the energy efficiency of the HPC system.This thesis focuses on performance prediction and energy efficiency optimization of parallel programs in HPC systems.It mainly consists of four aspects as following:First,to realize the program scalability prediction on the small-scale prototype system or a subset of the target HPC system before the large-scale migration of HPC programs,this thesis proposes a compiler-level IR based scalability prediction framework.It combines compilation technology with fine-grained regression analysis to model the computation and communication of HPC programs separately.To reduce the modeling overhead,we propose the hybrid basic block profiling and code pruning algorithm in the computation prediction module and the fine-grained regression modeling method in the communication prediction module.The whole process does not need the guidance of domain experts and realizes the automation of performance modeling.Experiments are carried out on the Taub cluster and the Tianhe-2 supercomputer using real HPC applications.The results show that,for different applications,this method's prediction error is between 0.35% and11.61%,and the average error is 4.28%.Compared with the traditional pure regressionbased prediction method,this method has higher accuracy in predicting applications' performance in large-scale HPC systems.Then,to realize the multi-parameter performance prediction on the target system after the large-scale migration of HPC programs,this thesis proposes a multi-parameter performance modeling and prediction framework.It utilizes basic block frequencies as features and adopts a machine-learning algorithm to construct multi-parameter performance models with high generalization ability automatically.To reduce the prediction overhead,we propose some feature filtering strategies to reduce the number of features in the training stage and build a serial program called BBF collector for each target application to collect feature values in the prediction stage quickly.Experiments are carried out on the Taub cluster and the Tianhe-2 supercomputer using real HPC applications.Results show that our method achieves better prediction than other input parameter-based modeling methods,its average prediction error 6.09%,and average prediction overhead is less than 0.13% of the total execution time in the prediction stage.Next,to evaluate the performance of cross-platform porting of HPC programs,this thesis proposes an automatic construction method of general benchmarks for HPC applications.It takes as input the traces collected by parallel processes in the execution of the original parallel program and automatically generates a high-fidelity benchmark that can fully reflect the computation,communication,and I/O behaviors of the original program.Experiments are carried out on the Taub cluster and the Tianhe-2 supercomputer using real parallel applications.Results show that the generated benchmark can accurately maintain the original parallel program's performance characteristics and accurately predict its performance.Besides,we can reduce the generated benchmark's execution time by reducing the number of iterations of the loop proportionally,thus decreasing the prediction overhead.This framework achieves a speedup of 10 x in performance prediction compared to actual execution,and the prediction errors are less than 10%.Last,this thesis combines the powercap with uncore frequency scaling and proposes an approach to predict the Pareto-optimal powercap configurations on the powerconstrained system for parallel applications.This approach first uses the elaborately designed micro-benchmarks and a small number of existing benchmarks to build the training set.It then applies a multi-objective machine learning algorithm that combines the stacked single-target method with extreme gradient boosting to build multi-objective performance and energy models.The models can be used to predict the optimal processor and memory powercap settings,helping compute nodes perform fine-grained powercap allocation.When the optimal powercap configuration is determined,the uncore frequency scaling is used to optimize energy consumption further.Compared with the reference powercap configuration,the predicted optimal configurations predicted by this method can achieve an average powercap reduction of 31.35%,an average energy reduction of12.32%,and average performance degradation of only 2.43%.
Keywords/Search Tags:Performance Modeling, Performance Prediction, Energy Efficiency Optimization, Power-constrained System, High Performance Computing
PDF Full Text Request
Related items