Research On Data-driven Performance Prediction And Optimization Of HPC Programs

Posted on:2021-02-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J W Sun

Full Text:PDF

GTID:1368330602999126

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Performance modeling is a widely concerned problem in high performance com-puting(HPC)community.Specifically speaking,performance modeling is using theo-retical analysis or empirical evaluation to quantitatively formulate the relation between the performance of an HPC application and input parameters,system specifications,en-vironment settings,etc.Performance models can predict the execution time of HPC ap-plications under varying conditions,which promote the effectiveness of job scheduling,resource management,performance tuning,etc.Modern HPC ecosystems have been increasingly complicated.Traditional performance modeling methods,like analytical modeling and replay-based modeling,suffer from their requirement of domain knowl-edge or huge time and space consumption.In recent years,performance data sampling and analyzing techniques have been substantially developed.With machine learning methods,data generated from HPC systems can be used for automatically constructing performance models and efficiently predicting the execution time of HPC applications.This thesis focuses on data-driven prediction and optimization of the performance of HPC applications.It mainly consists of three aspects as following:(1)Performance modeling of HPC applications based on runtime featuresStatistical models use machine learning techniques to fit the relation between fea-tures and performance.Input parameters of HPC applications are the most intuitive features,which are widely adopted in related studies.However,important performance factors may not be explicitly covered by input parameters.We adopt runtime features to construct performance models,including values of variables,counters of branches,loops,and MPI communications.As a superset of input parameters,runtime features generally contain more complete and relevant information that affects performance.By automatically instrumenting an MPI program,we can capture potential runtime fea-tures.Based on these features,we conduct a two-stage feature reduction and generate a performance model using random forest.Our experiments and analyses of three paral-lel applications,Graph500,GalaxSee,and SMG2000,on real supercomputing systems confirm that our method can identify runtime performance features of HPC applications without domain knowledge and precisely predict the performance.(2)Research on low-cost strategies of performance data samplingStatistical performance models require a large amount of historical execution data.For HPC applications,measuring their performance can cost from hours to days.Thus the collection of performance data is quite expensive.This thesis proposes a strategy that reduces the demand of data without loss of model accuracy.This strategy con-sists of two stages,a transfer method based on an existing performance model,and a data sampling method using active learning,respectively.An HPC application has similar behavior patterns across different machines.Inspired by this observation,we design a transfer method that reuses a performance model on an existing machine.It effectively alleviates the cold start problem when generating a performance model on a new machine.Besides,active learning techniques are adopted to analyze the redun-dancy of training data and further reduce data requirements.We evaluate our strategy using Graph500,GalaxSee,and SMG2000 on three different HPC systems.The results validate that our strategy can effectively reduce the data requirements for performance modeling,meanwhile,the model accuracy does not decrease.(3)Model-based parameter tuning of HPC applicationsConfigurable parameters have important impacts on the performance of HPC pro-grams.To optimize the performance,tuning parameters is an essential procedure.Since exploring a vast parameter space and measuring a program's performance with varying parameters can be resource and time consuming,it is usually impractical to exhaus-tively search the space and find the optimal parameter configuration.Rather than ac-tually executing and measuring,model-based tuning methods adopt models to predict the performance of HPC applications and prune the search space.In this thesis,we propose an efficient parameter tuning method called RankTune It starts with a ranking model trained from a small set of performance samples.Then it iteratively predicts the performance order of candidate parameters,selects high-ranking candidates to ex-pand the training set and retrains the ranking model.This ranking-based method can efficiently guide the search of parameter space towards a high-performance subspace.Evaluations on nine parameter tuning tasks show that,compared with random selection and three existing sophisticated tuning methods,our method can find better parameter configurations with fewer measurements on applications.

Keywords/Search Tags:

High Performance Computing(HPC), Performance Modeling, Machine Learning, Bayesian Optimization, Learning to Rank

PDF Full Text Request

Related items

1	Study On Active Learning Methods For Empirical Performance Modeling Of High Performance Computing Programs
2	Rank Optimization For Person Re-identification Through Intelligent Machine Learning Techniques
3	Research On Machine Learning For Design Of Silicon On Insulator Lateral Power Device
4	A Research And Optimization Of Learning To Rank Based Personalized Recommendation Algorithms
5	Towards high-performance word sense disambiguation by combining rich linguistic knowledge and machine learning approaches
6	Research On Key Issues Of Performance Optimization In High Performance Computing Based On The Godson
7	Research On Parallelization Of Machine Learning Algorithms For On-chip Heterogeneous Multi-core Systems
8	High-performance Computing System Memory Subsystem Performance Prediction Model
9	Studies On Performance Optimization Techniques For Big Data Learning Based On Cloud Computing
10	Application Performance Optimization Research Based On Computation Offloading In Mobile Cloud Computing