Font Size: a A A

Research On Structure-Activity Relationship Of Semiconductor Materials Based On Machine Learning

Posted on:2020-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:F HuFull Text:PDF
GTID:2518306353956709Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the continuous enrichment and optimization of machine learning algorithms and the maturity and development of data mining technology,the methods of artificial intelligence have penetrated into many interdisciplinary disciplines and computational materials science is one of them.Although the accuracy of the traditional material calculation method is high,there is a fatal shortcoming of extremely slow speed.Traditional methods of new material research are usually through a large number of repeated experiments and continuous optimization.A large number of artificial methods are used to screen the preparation process of the target compound and the fixed distribution ratio of the material group.But this method has obvious shortcomings,that is,it needs to rely on pure experience and try to achieve the goal,and it needs repeated experiments and constant correction to eliminate the error as far as possible.Such trial and error method is inefficient.Usually,the experimental results are not very satisfactory.Therefore,the method of trial and error will lead to result that the development of material cycle is too long,the method is too simple and the purpose is not strong enough.It will bring heavy burden on manpower and material resources for the guidance and research and development of new materials.Machine learning algorithm and data mining technology will be used to dig the internal mechanism of material science to achieve the optimal combination of accuracy and speed.Through the combination and analysis of materials science and informatics,detailed experimental steps of data mining are designed,including data acquisition,data analysis,data cleaning,feature extraction,feature transformation,feature selection and feature dimension reduction,algorithm training,prediction and evaluation.The primary task of solving related material problems is to collect corresponding data.Firstly,the band gaps and formation data sets of binary compounds,as well as the band gaps and formation data sets of ternary compounds are generated by using AiiDA high-throughput computing platform.Then the spatial group problem and the atomic coordinate problem in the original data set are preprocessed by using the quadratic coding method.The data set is scientifically and reasonably divided by 5-fold cross validation.Then the band gap and formation energy data of binary compounds are input into three algorithms:support vector regression,random forest and gradient boosting decision tree,and the corresponding experiments are compared and analyzed.It is found that gradient boosting decision tree algorithm performs best,random forest algorithm takes second place,and the support vector regression performs worst.Data mining methods such as feature derivation,feature combination and PCA dimensionality reduction are used to mine features more carefully,so as to improve the prediction accuracy.In order to apply gradient boosting decision tree algorithm to the data set of ternary compound better,the traditional gradient boosting decision tree algorithm is improved from three aspects:the second-order Taylor expansion of error function,the addition of regular term penalty factor and the self-defined splitting gain before and after splitting.Then some experiments are did on the band gap and formation energy datasets of ternary compounds.It is found that the improved gradient boosting decision tree algorithm outperforms the traditional gradient boosting decision tree algorithm on both datasets.Finally,the importance of the improved gradient boosting decision tree algorithm in the training and prediction process is analyzed,and the conclusions are applied to the discovery of new materials.
Keywords/Search Tags:Machine learning, regression analysis, support vector, random forest, gradient boosting decision tree, material informatics
PDF Full Text Request
Related items