As a representative of new metal materials,high entropy alloys(HEAs)show excellent performance in high temperature strength,corrosion and wear resistance,thermal stability and so on,which indicates a broad prospect for potential application.The multi-principal component design concept of HEAs make the traditional methods of empirical trial and error,thermodynamic simulation,first-principles calculation are obviously insufficient in guiding the composition design and performance optimization of alloys,while the data-driven method represented by machine learning provides a new idea and method for the research and design of HEAs.The study on HEAs based on machine learning relies on the prediction results of model,and the performance of model is limited by the insufficient data of HEAs.If the dataset is expanded to a certain extent,the problem of limited model prediction performance caused by insufficient data can be effectively alleviated.In order to solve the problem of insufficient data in HEAs hardness prediction,a new data augmentation method based on generative adversarial network(GAN),namely two-step data augmentation method,is proposed in this paper.This method can be used for data augmentation in regression task with a small number of training samples,and the prediction performance of regression model can be improved by adding additional data based on the existing data.The performance of two-step data augmentation method was tested on a dataset of 3099 samples under different degree of data insufficiency.The results show that within a certain range,the more data is in short supply,the more significant prediction performance of the model is improved after data augmentation.Even when all the training sets are used,the error of the model with the two-step data augmentation method decreases by 6.1% compared with the optimal error in the literature.In order to predict the hardness of HEAs,the four key features were determined through feature construction,feature selection and model evaluation.The two-step data augmentation method was used to improve the prediction accuracy of the model,and the influence of number of generated data on the performance of model was discussed.In addition,seven HEAs samples were collected to evaluate the generalization ability of the model.The results show that the two-step data augmentation method can effectively solve the problem of insufficient data in HEAs hardness prediction.By using model independent interpretability methods,including feature importance,partial dependence plot and SHAP method,the interpretability analysis of the HEAs hardness prediction model after modeling is carried out,and the reason for the model to make decisions are deeply understood,and the relationship between the prediction results and each feature,as well as the degree of influence of each feature on the prediction results are understood.In particular,the partial dependence plot of two important features provides a reasonable guidance range for the design of high hardness HEAs.Finally,the trend of relative importance of features explains why the two-step data augmentation method improves the model prediction performance. |