The Research On Distributed Material Database Allocation Strategy For High-Throughput Computing

Posted on:2021-05-09

Degree:Master

Type:Thesis

Country:China

Candidate:G Wang

Full Text:PDF

GTID:2481306122468764

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The high-throughput multi-scale material simulation calculation has the characteristics of large scale,high concurrency,and high frequency demand.The data calculated by the material simulation has the characteristics of data dispersion,severe fragmentation,and scattered structure.The traditional centralized material database cannot meet the data management needs of high-throughput material calculation and massive data diversification.A distributed database with strong concurrency capability is needed to provide information support for high-throughput material simulation.The distributed database has the characteristics of high concurrency,high scalability,high security and high reliability,and can meet the needs of highthroughput multi-scale material computing business.The problem of data distribution in distributed databases is a key issue that must be considered in database system design.Over the past few decades,experts and scholars at home and abroad have explored a variety of distribution methods,but these methods have deficiencies in actual system transaction applications,including the low search efficiency of the algorithm,the complexity of the cost formula is difficult to count,and the distribution result is relatively large from the optimal distribution.In order to solve the problem of data allocation in high-throughput material computing environment,the paper based on the project requirements and existing research proposes an improved data allocation strategy based on genetic algorithm.In the allocation strategy in this paper,the improvement of genetic algorithm is mainly reflected in the three stages of population initialization,selection operation and genetic operation.These improvements not only overcome the limitation of the traditional genetic algorithm that is easy to fall into the local optimal solution at the initial stage,but also ensure that the algorithm in this paper has good convergence and a faster convergence speed,and at the same time performs the crossover and mutation probability of individual genetic operations.The adaptive operation ensures the diversity of the individual populations in the later stages of algorithm evolution.Finally,simulate the experimental environment of three kinds of high-throughput material calculation,design the enumeration method,genetic algorithm and the improved genetic algorithm three data distribution methods in this paper,and compare the simulation experiments.Relying on the "Key Technology and Support Platform for Material Genetic Engineering" project and the research content of this article,a distributed data platform for material genetic engineering was constructed.The database system of this platform is composed of My SQL database and Mongo DB database.My SQL database is a local centralized database,which is the core business Data storage,such as user accounts,simulation results,etc;Mongo DB database is an unstructured distributed database,which is deployed in 4 national supercomputing centers in Changsha,Tianjin,Jinan and Wuxi to realize data automation of highthroughput multi-scale material simulation calculations Collection,cleaning and integration,the two databases are combined with each other to achieve a balance between economy and efficiency.The algorithm in this paper is improved based on genetic algorithm,so it can not perfectly avoid the disadvantages of genetic algorithm instability and premature convergence.During the experiment,there have been several abnormal value fluctuations and premature convergence.Based on the existing work In-depth research on the fitness function and adaptive probability in the algorithm is the direction of our future research work.

Keywords/Search Tags:

High-throughput calculation, Distributed database, Material database, Data distribution, Genetic algorithm

PDF Full Text Request

Related items

1	Calculation Data Collection And Retrieval Analysis Database Of Catalytic Materials
2	The Distributed Management And Application Of Easymap's Data
3	Study On The Establishment Of Cutting Database For Slender Shaft
4	The Developments And Applications Of The High-throughput Computational Methods And Toolkits Combining With Machine Learning For Materials Design
5	Build Optimization And Research Of Meteorological And Atmospheric Pollution Database System
6	Research And Development Of Engineering Materials Database Based On Web Technology
7	Cutting Parameters Optimization Of Titanium Alloy And Database System Development
8	Research Of Distributed Database In Collaborative Development System Based On Cashmere Product Under Internet
9	Establishment Of First-principles Repository And High-throughput Screening Of Thermoelectric Materials
10	Data Storage And Transfer Technology Of Remote Belt Conveyor Inspection