Font Size: a A A

Research On Drug-protein Interaction Prediction Method Based On Multi-information Fusion

Posted on:2022-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:B X ChenFull Text:PDF
GTID:2511306323951249Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Identifying interactions between drugs and target proteins is a critical step in the drug development process,as it helps identify new targets for drugs and accelerate drug development.Exploring new therapeutic effects for the drugs that have already been published will help to reduce the cost of drug research and development.In previous studies,most prediction methods only considered drug and protein data from a single source.That is,the information related to drugs and proteins is not considered from multiple dimensions,and the sparseness of the data itself is not fully considered.Therefore,it is necessary to develop a prediction method that integrates multiple drug and protein-related data.We propose three prediction methods that integrate multiple biological information data.One is a prediction method that explores the interaction between drugs and targets based on a gradient boosting decision tree,and the other is a prediction method based on autoencoders to predict the interaction between drugs and proteins and an integrated learning prediction model based on graph convolution.NGDTP is a method based on non-negative matrix factorization and gradient boosting decision tree for drug-protein interaction prediction.In our data,there are 1923 known drug-protein interactions(positive features),and more than 1 million unknown interactions(negative features),which forms a class imbalance.In fact,negative features are also helpful to our prediction effect and can provide effective information for our experiments.NGDTP is an ensemble model that can make full use of negative feature information.NGDTP obtains the association scores through establishing multiple decision trees for a drug-protein pairs and finally adds up the scores of all trees as the final score.Sorts the scores from high to low to predict the protein that is most likely to interact with each drug.AEFM is a prediction method based on codecs and GBDT.Regarding autoencoders,we are mainly used for data dimensionality reduction and fusion of information from multiple sources.Unbalanced data has little effect on GBDT's prediction.This will help us finally predict the interaction of drugs and proteins.AEFM model will integrate our drug,protein,and disease data after being trained by an autoencoder,which reduces noise and makes the data denser.Next,the processed low-dimensional feature matrix is used as the input of the GBDT model,and the data is predicted by the GBDT model,and the score of the interaction relationship between a drug and a protein can be obtained,we rank all the scores in descending order to highlight the possibility of drug-protein interaction.GCAEF is an ensemble learning model based on graph convolution neural network(GCN).Our original data is too sparse.In order to learn the feature representation of drug nodes and protein nodes in low dimensional space,we use an automatic encoder model based on GCN.As a graph neural network,graph convolution network can make full use of the attribute and topological information of nodes in drug network(or protein network)to learn the low dimensional eigenvectors of nodes.Then,we use an ensemble learning model based on gradient lifting decision tree to predict the possibility of interaction between drug-protein pairs according to their feature vectors.
Keywords/Search Tags:drug–protein interactions, gradient boosting decision tree, non-negative matrix factorization, autoencode, graph convolutional network
PDF Full Text Request
Related items