Font Size: a A A

Study On Lysine Acetylation Site Prediction Based On Deep Transfer Learning

Posted on:2020-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:J G LiFull Text:PDF
GTID:2370330596470885Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein post-translational modification is one of the most important problems in computational biology,by altering the properties of a protein by adding functional groups to one or more amino acids.The reaction of introducing nitrogen,oxygen,and carbon atoms into acetyl CH3CO-is acetylation.As one of the most important posttranslational modifications,acetylation plays a key role in a variety of biological functions,such as transcriptional regulation,cytokine signaling,and apoptosis.Studying whether an amino acid residue will undergo acetylation reaction and exploring and learning the mechanism of acetylation are important for understanding the expression of cellular genetic information and the regulation of biological mechanisms.Existing methods for identifying protein acetylation sites can be divided into two broad categories: mass spectrometry and computational methods.Mass spectrometrybased experimental methods can find acetylation sites from eukaryotes,but can be time consuming and expensive.Therefore,it is necessary to develop a calculation method capable of efficiently and accurately identifying a protein acetylation site.The existing calculation methods usually rely on feature engineering.The data collection and feature extraction seriously affect the accuracy of the acetylation site judgment.The redundant features and unrelated features will lead to redundancy and judgment errors,respectively.Based on these questions,this paper uses a deep learning framework to aid in acetylation site prediction,which is capable of mining potential features from large-scale training datasets through multi-layer networks and nonlinear mapping operations.In this study,this paper proposes a dual model deep learning architecture to aid in the prediction of acetylation sites.Data were first collected from the Protein Lysine Modification Database(PLMD),including general acetylation data and acetylation data for the three species,and were divided into training sets,validation sets,and independent test sets.Then extract two kinds of features from the data,one is the protein sequence information,and the other is the physical and chemical properties.For the two types of features,two different networks are trained first,then the two networks are merged to increase the accuracy of the prediction sites,and the Bayesian Optimization method is used to optimize the parameters.In order to predict species-specific data with small dataset size,transfer learning was used to transfer the network to a specific species dataset for acetylation site prediction,and good results were also achieved.The experimental results show the effectiveness of the network,the accuracy rate is 70.8%,the sensitivity is 72.3%,the specificity is 70.7%,and the MCC value is 0.251.The performance of the species-specific data is also better than other tools,indicating that the network can be applied to acetylation site prediction.
Keywords/Search Tags:Protein post-translational modification, acetylation, deep learning, transfer learning
PDF Full Text Request
Related items