Font Size: a A A

Prediction Of Protein Serine ADP-ribosylation Modification Sites Based On Deep Learning

Posted on:2022-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:X L WeiFull Text:PDF
GTID:2510306566491294Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Protein post-translational modifications(PTMs)increase the functional diversity of the proteome by adding functional groups or the covalent addition of proteins.Post-translational modification can regulate the activity of most eukaryotic proteins by regulating protein functions and cellular processes,which plays a key role in most biological processes.Among them,ADP-ribosylation(ADPr)regulates many keys.It is found that the modification of amino acids at the protein target of ADP-ribosylation can regulate key cell pathways of eukaryotes and lay the foundation for the pathogenicity of certain bacteria.Therefore,it can accurately identify the modification sites of ADP-ribosylation from a large number of protein sequences,which can not only promote the research on the modification function and biological effect of ADP-ribosylation but also provide support for clinical trials and drug research.Like most post-translational modifications,the cellular abundance of ADP-ribosylation is very low,and the rapid conversion of modifications in the body further complicates the modification process.Therefore,it is difficult,complicated,and expensive to use experimental methods to identify ADP-ribosylation sites.With the further development of machine learning,it is necessary to develop a computational model to identify ADP-ribosylation sites.Therefore,in view of the current status of ADP-ribosylation site prediction,the research project in this thesis covers the following four parts:(1)A standard data set of serine ADP-ribosylation modification was constructed for the first time.By collecting experimental data based on mass spectrometry and searching the standard protein sequence database(Uniprot),a five-step data cleaning method was used to meet the requirements of sequence data quality.(2)The feature extraction project of serine ADP-ribosylation modification has developed 4 classifiers based on traditional machine learning algorithms and 8 classifiers based on deep learning algorithms through different methods of extracting sequence pattern features.The calculation results show that the random forest model based on binary encoding has the best predictive performance in the traditional machine learning algorithm constructed,and the two convolutional neural network models based on one-hot encoding and word embedding vectors have the best performance among all algorithms.(3)In the research problem of serine ADP-ribosylation modification site prediction,the two best deep learning models were mixed to construct the prediction model Deep ADPr S.Through ten-fold cross-validation and independent testing,Deep ADPr S performed the best among all models,and the ROC curve The area enclosed by the bottom and the coordinate axis is 0.935 and 0.932 respectively,and the excellent prediction performance of Deep ADPr S is reflected through visual display.In order to further reflect the performance of Deep ADPr S,this paper discusses and uses the published modification site data based on aspartic acid(Asp,D)and glutamic acid(Glu,E).The results show that Deep ADPr S has the highest predictive performance.(4)Developed an online website for Serine ADP-ribosylation site prediction based on DeepADPrS.
Keywords/Search Tags:Biological information, feature extraction, deep learning, post-translational modification, serine
PDF Full Text Request
Related items