Font Size: a A A

Research On SARS-CoV-2 Protein Function Prediction Method And Related Problems Based On Deep Learning

Posted on:2023-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:L YanFull Text:PDF
GTID:2530307142954549Subject:Mathematics
Abstract/Summary:PDF Full Text Request
COVID-19 caused by SARS-Co V-2 is a highly contagious disease.The disease has spread rapidly around the world and has triggered an urgent health and socio-economic crisis.In the face of the accumulation of SARS-Co V-2 protein data,bioinformatics can effectively clarify the biological significance of big data.Antimicrobial peptides(AMPs)with anticoronavirus(anti-Co V)functions are very small proteins that can alleviate SARS-Co V-2-related syndromes by regulating associated gene expression levels.In addition,quantifying protein abundance and phosphorylation changes can accurately predict phosphorylation sites in host cells infected with SARS-Co V-2,and understand the molecular mechanism of host cell regulation after SARS-Co V-2 infection,which can help develop coronavirus-specific drugs.However,traditional experimental methods are time-consuming and laborious,so it is particularly important to use deep learning to predict the function of SARS-Co V-2 proteins.This paper studies the prediction of SARS-Co V-2 protein function,and the research content is as follows:1.A method for predicting the anti-Co V peptides based on deep learning,Anti CVPDeep,is proposed.First,the six methods of extraction are used to obtain the original feature vector of AMPs and K-means SMOTE is used to handle imbalance data.Then,the processed data is passed through the input gate,forget gate and output gate of the bidirectional long short-term memory network(Bi LSTM)to select the optimal feature subset.Next,the important antimicrobial peptides information is given a higher weight through the self-attention mechanism,which enhances the model’s ability to learn features.Finally,the data from self-attention layer is input into fully connected neural network(FCN)to predict anti-Co V peptides.The AUC values of the four datasets all reach above 98%.The geometric mean(GMean)of antivirus,non-AVP,non-AMP and all-Neg recognition on the independent test sets reach 90.05%,92.63%,99.46% and91.24%,respectively.The results show that Anti CVP-Deep is helpful for the identification of anti-Co V peptides.2.A new model of SARS-Co V-2 phosphorylation site prediction,DE-MHAIPs,is proposed.First,we use six feature extraction methods to extract protein sequence information from different perspectives.For the first time,we use a differential evolution(DE)algorithm to learn individual feature weights and fuse multi information in a weighted combination.Next,Group LASSO is used to select a subset of good features.Then,the important protein information is given higher weight through multihead attention.After that,the processed data is fed into long short-term memory network(LSTM)to further enhance model’s ability to learn features.Finally,the data from LSTM are input into fully connected neural network to predict SARS-Co V-2 phosphorylation sites.The AUC values of the S/T and Y datasets under 5-fold cross-validation reach91.98% and 98.32%,respectively.The AUC values of the two datasets on the independent test set reach 91.59% and 95.76%,respectively.The experimental results show that the DE-MHAIPs method exhibits excellent predictive ability compared with other methods.
Keywords/Search Tags:anti-coronavirus peptides, SARS-CoV-2 phosphorylation, deep learning, multi-head attention mechanism, multi-information fusion, imbalanced algorithm, differential evolution algorithm, feature selection algorithm
PDF Full Text Request
Related items