Font Size: a A A

Design And Implementation Of An Integrated Platform For Material High-throughput Computing And Machine Learning

Posted on:2022-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:B SunFull Text:PDF
GTID:2518306314951639Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years with the rapid development of computer science,many industry areas and the intersection of computer is more and more big,in the field of materials,material data mining,high-throughput screening,calculates material simulation design,artificial intelligence as the core of virtual design technology has been widely used in the design of new materials,such as research and development in the preparation of synthetic process regulation,its effect than people think.China has put forward a material genetic engineering program to speed up the development of materials.But it still lags behind the United States,Switzerland,Japan and other countries in terms of digital.Therefore,in the process of promoting the construction of material information in China,how to build a reasonable material database,how to realize the rapid accumulation of data with the combination of high-throughput computing,and how to use appropriate machine learning algorithms to better help material workers have become an urgent problem to be solved at present.In this paper,by building an integrated platform of high throughput and machine learning,high-throughput calculation is carried out through data in the material database to realize data expansion and accumulation.On this basis,data mining and experimental simulation are carried out by combining machine learning algorithm.The specific contents are as follows:(1)Build a high-throughput calculation part,visually allow users to select one or more materials to be calculated,extract valid calculation data from the database,and automatically generate high-throughput input files: POTCAR,POSCAR,K-POINT,INCAR.Generate multiple input file packages and submit them to run in the calculation queue in parallel.It greatly improved the inefficiency of traditional manual construction of input file packages and manual submission.And the output file package is also processed,the running status is visually displayed,the process of analyzing the high-throughput calculation results and saving the calculated material data back to the database is convenient.It quickly increases the data accumulation process in the database,and provides a data foundation for machine learning and data mining.Among them,the database is also the basic part of high-throughput computing.The initial data is obtained through WEB crawler technology.Through the overall analysis of the material field data,the background database of the current popular material platform Material Project is referred to,and the MongoDB database is used for data collection.(2)Design the machine learning module,will existing popular machine learning algorithms and material combined with the data,let users can easily for visualization of machine learning,in view of the material to the problem of the small amount of data at the same time,according to the experiment found that traditional machine learning algorithm is more suitable for small data quantity of material data,so the platform to build the support vector machine(SVM)with better effect,decision tree,the random forest algorithm,such as material can better help workers work.Through repeated tests,the integrated platform of material high-throughput computing and machine learning designed in this paper has perfect functions.The high-throughput computing function can effectively help material workers perform material calculations quickly and in large quantities,and the machine learning function can help users perform data mining.And the element data expansion function can effectively improve the accuracy of training.The platform has been successfully deployed on the server of the Shanghai Institute of Ceramics,Chinese Academy of Sciences,and provided data support for a paper by the students in the institute.
Keywords/Search Tags:high throughput computing, chine learning, terial data strip thickness, MongoDB
PDF Full Text Request
Related items