Font Size: a A A

Materials Data Standardization And Several Machine Learning Applications

Posted on:2021-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WangFull Text:PDF
GTID:2428330614456803Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine learning has developed rapidly in recent years,not only in the computer field but also widely applied to many traditional disciplines.As a traditional discipline mainly based on experiments,it is costly to develop new materials in materials science.Therefore,data-driven materials genome engineering is a popular research direction in materials science.However,the multi-source,heterogeneous,small sample,and noise-containing materials data limit the application of machine learning methods.Therefore,this paper focuses on the materials data standardization and several machine learning applications.The main work includes:1.Aiming at the problems of complex and difficult to express material data,this paper proposes a materials data specification design method.This method is based on a hierarchical design idea.It uses the Backus-Naur form to define specifications,and extensible markup language schema definition to implement specifications,extensible markup language to store data,extensible stylesheet language to display and output data.Besides,the applications of the visual material data specification design system and more dozens of materials data specifications further demonstrate that the material data specifications design method in this paper is formal,flexible,ease of use,and extensible.It can deal with complex heterogeneous materials data.2.Aiming at the large-scale materials data clustering problem,a Spark-based artificial bee colony clustering algorithm is proposed.The algorithm aims to evaluate the clustering error function by using Spark clusters so that it can deal with large-scale data sets.On the clustering analysis of glass-forming ability,the maximum clustering error is only 63% of the maximum clustering error of the K-means algorithm,which indicates that the algorithm proposed in this paper is more stable.In the multi-node Spark cluster experiment,the acceleration ratio from a single node to four nodes is close to linear,which indicates that the algorithm can get a good speedup with enough data.3.For the multi-objective optimization problem of materials,a multi-objective genetic algorithm with linear constraints is proposed.The main innovation of this algorithm is to use combination coding to achieve linear equality constraints so that the algorithm can accurately obtain the solution that meets the constraints.Compared with the exponential time complexity of the grid search method,the algorithm can adjust the running time flexibly according to the quality requirements of the solution set.On the multi-objective optimization problem of the temperature range of the negative expansion material and the negative expansion thermal coefficient,after 20 random initialization runs,the ratio of the Pareto optimal solution contained in the result set is 100% at maximum,72% at minimum,81% in average.At the same time,the value of the result set in the two target directions can reach 113% and 88% of the range of original data set,respectively.This indicates that the algorithm has high search efficiency and the distribution of the result set is good.4.Aiming at the automation and intelligence of materials machine learning,an automatic workflow design is proposed and an automatic machine learning workflow system is developed.The system uses a visual way to build the workflow.The background workflow running engine can automatically complete data reading,feature selection,algorithm training,hyperparameter selection,multi-objective optimization,etc.This system greatly reduces the complexity of machine learning in the materials field.
Keywords/Search Tags:Materials data specification, Artificial bee colony algorithm, Multi-objective genetic algorithm, Automatic workflow
PDF Full Text Request
Related items