Font Size: a A A

Research Of Enzyme Modification Methods Via Machine Learning

Posted on:2024-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhaoFull Text:PDF
GTID:2530307136989659Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Enzymes are not only the core substances that organisms rely on for survival but also play an irreplaceable role in food processing,industrial production,clinical treatment,disease diagnosis,and other fields.Many studies have found that natural enzymes are often unable to meet the needs of practical applications in terms of stability,tolerance and selectivity.It is necessary to further explore efficient enzyme modification technologies to customize enzymes with desired functions.Although directed evolution,semi-rational design and rational design have achieved remarkable results in enzyme modification,they need a lot of calculation or experimental screening work.In recent years,artificial intelligence technology represented by machine learning has made breakthrough progress in multiple fields,providing powerful means to solve key problems in enzyme modification.Based on the semi-rational design,this article focuses on the research of enzyme molecular modification methods based on data-driven optimization to reduce the time cost and economic investment of enzyme modification.In this paper,enzyme modification is modeled as a combinatorial optimization problem of black box functions,to design coding methods to extract protein features,develop Bayesian optimization methods to guide enzyme modification,and explore active learning methods to improve the generalization performance of the surrogate model.Finally,the effectiveness of the proposed method was demonstrated through simulated enzyme modification experiments and real carbonyl reductase activity modification experiments.The main research content and innovative achievements of this article include:1.To realize the directed modification of enzyme molecules based on machine learning,this paper first modeled it as a combinatorial optimization problem of black-box functions.It should be pointed out that the optimization of enzyme molecular transformation involves combinatorial optimization,high-dimensional optimization and batch optimization.Therefore,this paper introduces the Bayesian optimization method of machine learning technology to solve the problem,and studies the construction of the surrogate model and the development of the acquisition function.2.To solve the problem of poor representation ability to exist protein coding methods,this paper presents a new coding method paradigm,and designs an effective protein coding method based on this paradigm,that is,low dimensional mutual information coding method.The model fitting accuracy and simulation modification experiments based on publicly available enzyme molecular modification data show that the proposed protein coding method is significantly superior to existing coding methods,and the coding results have a promoting effect on machine learning based on the directed modification of enzyme molecules.3.In response to the problems existing in existing Bayesian optimization methods,this chapter proposes a new Bayesian optimization framework with termination conditions,and then develops a batch Bayesian optimization algorithm based on the maximum variance change based on this framework.According to the prior information collected from the historical data by the surrogate model,the batch Bayesian optimization algorithm based on the maximum variance change can judge whether the current global optimal solution of the optimization problem is queried,thus avoiding unnecessary evaluation experiments.The experimental results show that compared with existing batch Bayesian optimization algorithms,the proposed batch Bayesian optimization algorithm has more advantages in optimization performance,robustness,convergence,and other aspects.4.To improve the generalization performance of the surrogate model,this paper proposes a pool-based batch active learning method for Gaussian process regression,taking the weight information gain as the measure of the model change.This method can not only directly measure the change of the model according to the reduction of the weight uncertainty,but also consider the similarity between the samples of the same batch,ensuring the effectiveness of batch active learning of Gaussian processes.Experimental results show that the proposed pool-based batch active learning method for Gaussian process regression can provide a "small and refined" training set for the Gaussian process regression surrogate model construction.This paper systematically studies the critical issues in enzyme molecular modification from the perspective of data-driven optimization.It innovatively explores basic protein coding methods and batch Bayesian optimization algorithms.The research results not only effectively reduce the cost of enzyme molecular modification,but also provide new research ideas for protein modification research,thereby promoting the significant development of machine learning methods in protein engineering.
Keywords/Search Tags:Enzyme modification, protein coding methods, batch Bayesian Optimization, batch active learning, combinatorial optimization, high dimensional optimization
PDF Full Text Request
Related items