Font Size: a A A

Prediction Of Protein Khib Modification Sites Based On Deep Learning

Posted on:2022-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:L N ZhangFull Text:PDF
GTID:2510306566991319Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Protein post-translational modifications(PTMs)are refers to the chemical changes that occur after the protein is synthesized.It can affect the structure and electrophilicity of proteins,control key mechanisms such as protein stability and positioning,and regulate many biological functions.Recently,lysine 2-Hydroxyisobutyrylation(Khib)has been identified in different organisms,which involves several biological functions,including amino acid biosynthesis,carbon catabolism,glycolysis and transcription.To understand its regulatory mechanism,the key step is the identification of the Khib sites.At present,tens of thousands of lysine 2-Hydroxyisobutyrylation sites have been identified in various species(i.e.,Homo sapiens,Oryza sativa,Toxoplasma gondii,Physcomitrella patens and Saccharomyces cerevisiae).For example,11,976 sites were found in physcomitrella patens,9,916 sites were found in Oryza sativa.However,only two traditional machine learning predictors(i Lys-Khib predictor for the human data set and Khibpred for the data sets of four other species except Toxoplasma gondii)have been developed,so there are species limitations.To further improve the prediction accuracy and cover more species,in this thesis I analyze and develop an efficient and universal model for the prediction of Khib sites.My analysis can be summarized using the following four points:(1)The Khib data set was constructed through the collection of the biochemical experiment data.The related non-redundant data set was retained through a series of data cleaning.(2)Feature extraction and selection of the Khib data.Based on different feature encoding methods,nine traditional machine-learning classifiers were constructed to identify Khib sites.The combination of different features compared favourably to the individual features.In the deep-learning algorithms,the CNN algorithm with the one-hot encoding approach showed optimal performance and robustness.Moreover,the deep-learning models had better characteristics than the conventional machine learning ones.(3)The sequence characteristics of different species were analyzed and summarized.Accordingly,the differences between the species-specific models and the general model were constructed and compared.Furthermore,we generated the model dubbed DeepKhib for the prediction of lysine 2-Hydroxyisobutyrylation sites and DeepKhib showed better performances than the previously developed models.(4)The on-line version of DeepKhib was developed and freely accessible?...
Keywords/Search Tags:bioinformatics, machine learning, deep learning, Post-translational modification, lysine 2-Hydroxyisobutyrylation
PDF Full Text Request
Related items