Font Size: a A A

Research On The Prediction Method Of Secretory Proteins Based On Deep Learning

Posted on:2022-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2480306332952489Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Accurately measuring and evaluating biomarkers as indicators for differentiating normal and disease samples is important for detecting diseases,making prognoses,and studying disease occurrence mechanisms.At present,biomarker detection from body fluids such as blood,urine and saliva is an effective method for diagnosing diseases.Saliva is a better source of biomarkers because it is relatively simple in composition and can be easily and noninvasively collected.Because there are many signals for various physiological and pathophysiological conditions in blood,most studies on body fluid biomarkers focus on blood.The existing studies are based on conventional machine learning methods in which features are selected from feature sets.Therefore,the results of these methods are largely dependent on the selected features.The processes of feature engineering and feature selection may result in incomplete or biased features.Compared with conventional machine learning techniques,deep learning methods can automatically learn complex feature representations from raw data.In this article,we mainly focus on prediction of saliva-secretory protein and blood-secretory protein based on deep learning.In the study of saliva-secretory protein prediction,we present a novel end-to-end deep learning model based on multilane capsule network.The first step involves converting the input protein sequences into evolutionary profile matrices using the Position-specific Iterative Basic Local Alignment Search Tool(PSI-BLAST).To address imbalance issues in the dataset during the training process,the bagging ensemble learning method is applied to the training set.The proposed model achieves high accuracies using 10-fold cross-validation on the training set and an independent test set(0.905 on training set;0.888 on independent test set),thus outperforming existing methods based on traditional machine learning algorithms.By comparing human saliva-secretory proteins detected experimentally by other studies with the results of our model,we find that our model can achieve a true positive rate of 89%.By comparing known salivary protein biomarkers of cancer with the results of our model,we find that our model can achieve an average true positive rate of 88%.In the study of blood-secretory protein prediction,we present a novel deep learning model combined with transfer learning by integrating a binary classification network and a ranking network to identify blood-secretory proteins from the amino acid sequence information alone.The loss function of proposed model for training is designed to apply descriptive loss and compactness loss to the binary classification network and the ranking network,respectively.The feature extraction subnetwork of proposed model is composed of a multi-lane capsule network.For binary classification,the proposed model achieves high accuracy using 10-fold cross-validation on the training set and an independent test set(0.915 for cross-validation;0.917 for independent testing),and thus outperforms existing methods based on traditional machine learning algorithms and state-of-the-art deep learning architectures for biological sequence analysis.By comparing human blood-secretory proteins detected experimentally by other studies with the results of our model,we find that our model can achieve a true positive rate(TPR)of 0.895.By comparing known blood-based biomarkers of colorectal cancer and lung cancer with the results of our model,we find that our model can achieve average TPRs of 0.878 and 0.858,respectively.The main contribution of this paper is as follows:(1)A novel deep learning model based on multilane capsule network is proposed,achieving good performance and outperforming existing methods based on traditional machine learning algorithms and state-of-the-art deep learning architectures for saliva-secretory protein and blood-secretory protein prediction;(2)The proposed secretory protein prediction model only uses amino acid sequences,which overcomes existing methods,heavily relying on annotated protein features;(3)The secretory proteins predicted by our model are statistically significant compared with existing biomarkers of cancer.
Keywords/Search Tags:Secretory Proteins, Deep Learning, Convolutional Neural Network, Capsule Network
PDF Full Text Request
Related items