Study On Active Learning Methods For Empirical Performance Modeling Of High Performance Computing Programs

Posted on:2021-04-26

Degree:Master

Type:Thesis

Country:China

Candidate:J P Zhang

Full Text:PDF

GTID:2428330602999109

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

HPC(High Performance Computing)has been widely used in many fields such as astrogeophysics,atmospheric and oceanic environment,and performance is the key of HPC programs.HPC programs usually contain some adjustable parameters,such as number of cores,algorithm alternatives,etc.Studies have shown that programs with performance tuning can achieve a performance acceleration of 10 or even 100 times.However,the parameters and performance of HPC programs often present complex nonlinear functions,so the performance tuning becomes extremely difficult.Empirical Performance Modeling(EPM)can well fit this complex relationship and achieve efficient heuristic parameter search.However,the large number of sam-ples are required for empirical performance modeling as training data,and HPC pro-grams usually occupy a large amount of computing resources and take a long time to run such as hours or even months.These pain points lead to high computing and time consumption of EPM.In order to reduce modeling cost,existing work has proposed a modeling method PBUS(Performance Biased Uncertainty Sampling)based on active learning,which first samples possible high-performance samples,and then uses an ac-tive learning algorithm to select the highest uncertainty samples,thereby reducing data redundancy.Compared to random and uniform sampling,PBUS reduces the required sample data to a certain extent,but the experiments in our work show that there are some shortcomings in the approach of PBUS separating performance and uncertainty into two factors,which may cause more serious data redundancy,so there is still much room for improvement.In order to solve the problem of data redundancy in the existing methods,this dissertation proposes a new active learning method which makes full use of existing data information and can efficiently explore high-performance samples in the parameter space.That is,a good balance can be made between exploiting known information and exploring unknown space.Specifically,we design a performance weighted uncertainty sampling strategy(PWU,Performance Weighted Uncertainty)in the active learning al-gorithm to identify samples with high uncertainty or high performance.Samples with high uncertainty can reduce the information redundancy between data and the label-ing cost of high-performance samples is less than poor-performance ones,so PWU can achieve two goals at the same time:reducing the required training data and avoiding high labeling cost of poor-performance samples.In addition,the PWU strategy com-bines two factors of uncertainty and performance,avoiding possible drawbacks in the PBUS method.To verify the effectiveness of our method,we utilize random forest to build em-pirical performance models for 12 computing kernel programs from the SPAPT suite and two typical parallel applications of scientific computing(Kripke,Hypre).The ex-perimental result shows that the method proposed in this work maintains the same pre-diction accuracy,and compared with the PBUS method,the modeling speedup of the PWU strategy can reach a maximum of 21 times and an average of 3 times.Specif-ically,regardless of different target programs or different modeling requirements,the prediction results of the performance model built by PWU are more stable,indicating that PWU is more robust.In addition,a performance tuning experiment based on the PWU empirical performance model is performed in this work.The result shows that the performance model established by the PWU strategy not only improves the efficiency of tuning,but also improves the quality of tuning.In summary,PWU successfully overcomes the shortcoming of existing methods,significantly reduces possible data re-dundancy,and has obvious advantage in terms of modeling efficiency,model quality,and method robustness.

Keywords/Search Tags:

Performance Modeling, Performance Tuning, Modeling Cost, Machine Learning, Active Learning, Random Forest, Sampling Strategy

PDF Full Text Request

Related items

1	Research Of Sampling Strategy In Active Learning Algorithms
2	Study Of Active Learning Algorithms On Imbalanced Data Using Extreme Learning Machine
3	Research On Data-driven Performance Prediction And Optimization Of HPC Programs
4	Research On ELM Image Classification Combining HOG And Random Forest
5	Towards high-performance word sense disambiguation by combining rich linguistic knowledge and machine learning approaches
6	Active Learning System Of Animation Based On Random Forest
7	Active modeling in cost-sensitive environments
8	Hadoop Parameters Tuning Method Based On Machine Learning
9	Research On Soft Sensor Modeling Based On Active Learning
10	Research On Active Learning Algorithms In Continual Learning Framework