Font Size: a A A

Theoretical Estimation Of Intracellular Concentration Of Metabolites In Micro-organisms

Posted on:2018-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:H F YangFull Text:PDF
GTID:2480305966457144Subject:Chemical Engineering and Technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of bioengineering,metabolic network models have been widely used in metabolic engineering and synthetic biology,requiring the intracellular concentration of metabolites as initial parameters.However,it is difficult to extract intracellular metabolites and then determine their concentrations by experimental methods,while the concentration of many metabolites is quite low.Complementary to the experimental methods,computational methods were used as effective assessing tools for the studies of intracellular concentrations of metabolites.In this study,data mining methods were employed to establish a metabolite concentration prediction model,using a feature subset including chemical descriptors,metabolic network topologic parameters and metabolic pathway features.Before model construction,genetic algorithm(GA)algorithm was used to select optimal feature subset from the training set containing 1669 features and 91 samples.The procedure was subjected to internal validation by leave-one-out(LOO)cross-validation,as well as external validation was proceeded in an independent test set,which was formed by 39 samples.According to the consistent results in both training set and test set,the 14features finally selected can be proved powerful for metabolite concentration prediction.Different machine learning algorithms have different advantages in different studies.In addition to the study above,four machine learning algorithms:improved na?ve Bayes,back-propagation neutral network,random forest and support vector machine,were performed to build the predictive model with the same features,aiming at picking out the best algorithm for metabolite concentration prediction.In training set,different algorithms showed different advantages,while in the test set,SVM model based on Gauss kernel function achieved an obviously better performance.Besides,the performance of the other 3 algorithms was similar to SVM models based on polynomial kernel function and sigmoid kernel function,which indicated the unique advantage of Gauss kernel function in mapping features into high dimensional space.Finally,support vector machine based on gauss kernel function was considered as the most suitable algorithm for intracellular metabolite concentration prediction.The SVM model developed is robust and has a good predictive power(R~2=0.74,RMSE=0.73,Q~2=0.57;R_p~2=0.59,RMSE_p=0.70,Q_p~2=0.58),which is significantly better than the existing algorithms,and has more extensive coverage.
Keywords/Search Tags:machine learning, metabolic engineering, concentration prediction, metabolic pathway, genetic algorithm, support vector machine, na?ve bayes, BP neural network, random forest
PDF Full Text Request
Related items