Due to the depletion of fossil energy and serious environmental pollution,photovoltaic energy,as a new,cheap and environmentally friendly alternative energy,is of great significance in all fields of society.The Material Project passed by the United States in 2011 and the Material Clouds passed by China in 2018 both put the application of machine learning in new photovoltaic materials into one of the national key development plans.However,in the field of new energy materials,due to the lack of existing data,low reliability of data sources,unclear data format other problems,the powerful data processing ability of machine learning can not be fully used in the analysis and design of new functional materials.Therefore,in the early stage of the research,it is necessary to effectively screen and process the data set for the property predictions by combining the relevant knowledge of machine learning and materials science.In this work,the most rapid development of photovoltaic materials in recent years,metal chloride perovskite,is selected as the research object,focusing on the stable and non-toxic inorganic non lead halide perovskite.In order to improve the accuracy of the calculation,the initial data set is filtered based on the properties of the materials,then GBRT model and Shapley Additive Explanation are used to evaluate the filtering results.It was found that the elements,crystal structure,and one-dimensional descriptors ι play the decisive roles in bandgap prediction,and the elements plays the decisive role in formation energy prediction.After properties prediction,Sharpley Additive Explanation is utilized to explan the result of GBRT.For band gap prediction,the electronegativity of B atom has the highest Sharpley feature importance,and it has positive correlation with prediction.For formation energy prediction,electronegativity of B-site atom(B_Electronegativity)and the first ionization potential(X_FirstlonizationPotential)have high Sharpley feature importance,and has a positive and negative correlation with the prediction,respectively.Moreover,the atomic properties associated with X-site elements have a joint influence on the prediction of formation energy.This work emphasizes that the selection and processing of data sets plays an important role in improving the accuracy of prediction results in the process of machine learning prediction of material properties.At the same time,Shapley model has a good development prospect in discovering the influence mechanism of material properties,supplementing and proving the existing knowledge system in the field of materials science. |