Font Size: a A A

Prediction Of Glutarylation Sites In A Machine Learning Framework

Posted on:2024-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:M W SunFull Text:PDF
GTID:2530306911993879Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a key issue in orchestrating various biological processes and functions,protein post-translational modification(PTM)occurs widely in the mechanism of protein’s function of animals and plants.Glutarylation is a type of protein-translational modification that occurs at active ε-amino groups of specific lysine residues in proteins,which is associated with various human diseases,including diabetes,cancer,and glutaric aciduria type I.Therefore,the issue of prediction for glutarylation sites is particularly important.The prediction of protein post-translational modification sites based on computational methods has emerged as a new study area with the advancement of computer science and can address the drawbacks of traditional experimental methods,which are costly and timeconsuming.In this article,deep learning and machine learning techniques are used to study the prediction of glutarylation sites in proteins.The number of amino acid residues(i.e.,sites)that can undergo post-translational modification of the protein is always very small from the perspective of the amino acid residues that make up a protein sequence,resulting in a significant imbalance between the number of amino acid sequences with post-translational modification sites(positive samples)and the number of amino acid sequences without post-translational modification sites(negative samples),seriously detrimental.This paper developed the Close Screening algorithm to address the imbalance between positive and negative samples of glutarylation.This algorithm is based on screening unlabeled samples.A prediction model named CSi Glu was suggested to be used in conjunction with the Light GBM,an ensemble learning classification model.Experiments have shown that this model has certain advantages in predicting positive samples,with sensitivity(Sn),specificity(Sp),accuracy(ACC),Matthews correlation coefficient(MCC),and area under curve(AUC)of 78.27%,69.16%,73.53%,0.4755,and 0.8127 in cross validation of ten folds,respectively.In response to the deficiency of the generalization ability of the CS-i Glu model,this study developed a brand-new deep learning-based prediction model for glutarylation sites named Deep DN_i Glu via adopting attention residual learning method and Dense Net.The focal loss function is utilized in this study in place of the traditional cross-entropy loss function to address the issue of a substantial imbalance in the number of positive and negative samples.It can be noted that Deep DN_i Glu based on the deep learning model offers a greater potential for the glutarylation site prediction after employing the straightforward one hot encoding method,with Sensitivity(Sn),Specificity(Sp),Accuracy(ACC),Mathews Correlation Coefficient(MCC),and Area Under Curve(AUC)of 89.29%,61.97%,65.15%,0.33 and 0.80 accordingly on the independent test set.To the best of the authors’ knowledge,this is the first time that Dense Net has been used for the prediction of glutarylation sites.Deep DN_i Glu has been deployed as a web server(https://bioinfo.wugenqiang.top/~smw/Deep DN_i Glu/)that is available to make glutarylation site prediction data more accessible.
Keywords/Search Tags:glutarylation, Close Screenin, LightGBM, DenseNet, deep learning
PDF Full Text Request
Related items