Predicting Multi-label Protein Subcellular Location Based On Deep Learning

Posted on:2023-11-27

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Deng

Full Text:PDF

GTID:2530307070973739

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

As a branch of proteomics research,protein subcellular location prediction plays an important role in many ways,such as exploring the specific functions of proteins and the mechanism of protein interaction.In this thesis we focus on the prediction of multi-label protein subcellular location by deep learning.The main work is as follows.First,we extract 3622 immunohistochemical images(IHC)from the Human Protein Atlas database according to certain rules and separate protein channels from IHC images.We use LBP method to extract features,and then transfer the multi-label problem into multi-class problem.Next,we use SVM,KNN and XGBOOST method to predict,at the same time,for better prediction,we generalize the binary focal loss to the multi-class focal loss as the loss function of XGBOOST.We can find that SVM reaches the maximal accuracy with 77.4%,but its Macro-F1 is 0.XGBOOST with Focal loss performs well,its Macro-F1 is 12.48%and accuracy is 60.4%.But the simplest method KNN performs best,its accuracy is 69.4%and Macro-F1 is 16.54%.From the analysis of the results,the reasons for affecting the prediction ability may be that the internal imbalanced distribution of each label,and the problem of label co-occurrence.Second,we train ResNet-18 on training set(with BCELoss),in the training process,we select the model with the largest metric value(metric:MacroRecall multiplied by minimum Recall for each label under the condition that the accuracy on the validation set is greater than 60%),as the final model for prediction.The accuracy on the test set is 59.4%.At the same time,we propose a new loss function to solve the problem of label internal imbalance and label co-occurrence.Network with the newly proposed loss function is trained and optimized under the same conditions,and it is found that its accuracy on the test set is 58.7%,which is almost the same as the model learned by BCELoss,however,its Macro-Recall and Macro-F1 are increased by 9.3%and 4.7%respectively,especially in cytoplasmic and membranous,the Recall value is increased by 16%and 24%respectively,so we believe that the newly proposed loss function significantly improves the prediction of multi-label proteins relative to BCELoss.Finally,we summarize the work of this thesis and give an outlook on the follow-up research work.

Keywords/Search Tags:

multi-label protein subcellular prediction, deep learning, label co-occurrence, label imbalance

PDF Full Text Request

Related items

1	A Method And Its Application Research For Protein Subcellular Localization Prediction Based On Multi-label Learning
2	Prediction Of Protein Subcellular Localization By Using Machine Learning Method And Its Application
3	Using Multi-label Learning Methods To Study Protein Subcellular Localization Prediction
4	Research On Prediction Of Sequence-based Multilocus Subcellular Localization
5	Research On Protein Subcellular Localization Prediction Under Multi-label Setting
6	Research And Implementation Of Protein Subcellular Localization Prediction System Based On Ensemble Multi-label Learning
7	Protein Subcellular Localization Prediction From Multi-label Learning
8	Application Of Class Imbalance Learning In Protein Subcellular Localization
9	A Multi-label Classifier Based On PSSM And GO For Predicting Protein Subcellular Localization
10	Research On Protein Subcellular Location Method Based On Deep Learning