Font Size: a A A

Study On Influence Factors Of Enzyme Thermostability Using Machine Learning

Posted on:2010-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2178360278975361Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Thermophilic enzyme has more attention than mesophilic enzyme,because it has more function than mesophilic enzyme in high temperature.But it's difficult to prepare that probably caused by more people interest.It is obtained through filtering thermophilic microorganisms,but the production is very low. In spite of this, thermophilic enzyme has been widely used in many fields of food,medicine, environmental protection and metal smelting.The idea of this paper is that study molecular mechanism of thermophilic enzyme to know the fold of protein by nachine learning and find the way to improve thermostability of mesophilic enzyme by protein engineering.Support vector machine (SVM) was chosen to study enzyme thermostability after contrasting SVM and Artificial Neural Networks. Amino acid composition is one of primary factors affecting protein thermostability.Therefore the percentage of 20-amino acid composition in their protein sequence was chosen as the feature vector of SVM. Then, predicted protein thermostability by SVM.After contrasting kernel functions in SVM, Radial Basis Function was chosen to train SVM and the accurary was 85.4%.Then we optimized SVM by geometrical method,SVM-KNN and iteration of training.The accuracy were 88.2%,86.1% and 86.1%,respectively.The geometrical method got the biggest increase of 2.8%.So we chose the geometrical method to optimize SVM parameters according to the accurary.Then data of percentage of amino acid was divided into 4 according to polar of amino acid.The original SVM and SVM with parameter optimization predicted respectively when polar of amino acid was feature vector.The accurary were 72.2% and 76.4%.The accurary rose 4.2% from the results.In dipeptide based prediction experiments of enzyme thermostability,the accurary when polar of amino acid was feature vector was 71.9%. In higher structure,the feature vector were hydrogen bond,salt bridge,volume and temperature factor(b-factor),respectively.The accurary were 81.3%,88.9%,55.8% and 59.0%. Finally,the aim of the experiment of Cyclomaltodextrin glucanotransferase (CGTase) mutation was improved its thermostability in high temperature environment without changes of its function.So the first step was calculated salt brigde of original CGTase and CGTase with mutation.Then the training set was the salt bridge data of higher structure.Salt bridge of original CGTase and CGTase with mutation and part of salt bridge which was chosen randomly were formed testing set.The predicted result by SVM was the accurary of classifying original CGTase and CGTase with mutation correctly.After iteration of 100, the original CGTase and CGTase with mutation were classified correctly per time.So we can get some conclusion:(1)In the primary structure,the key factors of enzyme thermostability are percentage of amino acid, polar of amino acid and dipeptide.(2) hydrogen bond and salt bridge are important factors for enzyme thermostability in higher structure. There is not a close relationship between volume ,temperature and enzyme thermostability. (3) The classifier of salt bridge as feature vector is correct and the mutation experiment of CGTase is successful.
Keywords/Search Tags:SVM, parameters optimization, structural parameters, CGTase
PDF Full Text Request
Related items