Font Size: a A A

Computational Prediction And Functional Analysis Of Lysine Acylation Sites

Posted on:2021-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y G WangFull Text:PDF
GTID:2404330602476970Subject:Applied Chemistry
Abstract/Summary:PDF Full Text Request
Post-translational modification of proteins is a dynamic and reversible process,which plays an important role in a varity of physiological and pathological functions across prokaryotes and eukaryotes.Although more than 600 post-translational modifications of proteins have been discovered,mainly occurs in lysine and arginine,the role of many modified substrates in pathology and physiology is still largely unknown.Accurately identifying modification sites is a crucial step to understand the mechanism of its biological process,and it is of great significance to discover drug targets for human diseases.Compared with the traditional experimental research that is expensive,labor-intensive,and time-consuming,the computational method has attracted more attention in recent years due to its advantages of convenience,high efficiency,and high accuracy.In this paper,we have separately developed online computational prediction tools for two post-translational modifications(2-hydroxyisobutyrylation and butyrylation)newly discovered in recent years,and performed a series of proteomic analysis.The details are as follows:1.Based on the optimal feature dataset constructed by random forest,a novel species-specific online prediction tool named KhibPred(http://bioinfo.ncu.edu.cn/KhibPred.aspx)was developed.Lysine 2-hydroxyisobutyrylation is closely related to the regulation of various biological diseases and biological mechanisms,such as bladder cancer,lipid metabolism,glycolysis/gluconeogenesis,TCA cycle,and protein biosynthesis and processing.Therefore,we collected the up-to-date and reliabel datasets of four species:Saccharomyces cerevisiaes(S.cerevisiaes)Physcomitrella patens(P.patens),Rice Seeds(R.seeds)and HeLa cells(H.cells),and then three types of features,including sequence-based information,physicochemical properties,and evolutionary-derived information,were used to represent a wide range of protein sequence fragments.In addition,six representative machine learning methods(support vector machine,random forest,decision tree,Gaussian Bayes and KNN)were employed to build models and make systematic comparisons.The results show that the prediction performance of support vector machine based is better than several other classifiers.Finally,an online predicton tool based on support vector machine was constructed.The evaluation of cross-validation and independent test set demonstrated that KhibPred has good robust performance and satisfactory results.2.Computational identification and functional analysis of lysine butyrylation based on a multi-feature optimization strategy.Lysine butyrylation can not only elicit structural and functional changes of chromatin that regulate a variety of epigenetic processes,but also induce butyryl-CoA in energy metabolism and cell signaling.Based on the recently published experimentally verified butyrylation sites dataset,three different feature selection strategies are applied to construct the optimal feature subset to train the model,and the information gain method was employed to select the optimal window size according to the combination of five features.A novel online prediction tool,namely LBP,for predicting lysine butyrylation sites was developed,and it can be freely available at:http://bioinfo.ncu.edu.cn/LBP.aspx.Based on the results of 5-fold cross-validation and independent test set,LBP was proved to be good prediction performance.Comparing with a single feature,the feature optimization algorithm enable effectively improves the model's prediction performance.The establishment of online prediction tools provides an effective supplement for further studying butyrylation modification.In addition,a pivotal framework is proposed to systematically analyze the biological connections and biological functions of butyrylated substrate proteins.The analysis results suggested that butyrylation is closely associated to protein metabolism process and ligase activity.
Keywords/Search Tags:post-translational modification(PTM), 2-hydroxyisobutyrylation, butyrylation, support vector machine, random forest, elastic net
PDF Full Text Request
Related items