Font Size: a A A

Graph-based Machine Learning Algorithms For Microbe Network Prediction

Posted on:2022-04-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H LongFull Text:PDF
GTID:1480306731983479Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since the 21 st century,life sciences have been continuously developed and it has become more urgent for human beings to reveal the mystery of life and protect their health.With human genome project finished,researchers started to realize the importance of human microbes,which plays crucial roles in human health and diseases.Therefore,various human microbe projects were launched.It is one of the most important research tasks in the fields of bioinformatics and computational bioinformatics to elucidate disease-causing mechanism of human microbes by systematically studying microbe-related networks.High-through sequencing and screening techniques have recently been developed rapidly,which produces a large mount of biological data,such as genome,proteomics,microbiome,metagenome and metabolomics.These data provide golden opportunities for research on human microbes.However,it remains challenging how to effectively mine valuable information from these omics data.Systematic research on microbial related networks is of great significance for indepth understanding of the pathogenic mechanism of microbes,promoting the development of drugs and the application of microbes in precision medicine,and providing theoretical basis for disease prevention,diagnosis and treatment.However,using traditional experimental methods to explore microbes will suffer from many challenges,such as time-consuming,high cost and high risk.In silico methods are a cost-effective alternative.In this thesis,based on biological networks and deep learning techniques,we mainly study the algorithms used to predict microbe-disease/drug associations.The main research works are summarized as follows:(1)Although many random walk-based algorithms have been developed for microbedisease association prediction,most of them ignore network topological information.In addition,for most of existing methods,there is still room to further improve the prediction results.Considering the above issues,we developed a novel computational model to predict human microbe-disease associations based on random walk by integrating network topological information,named NTSHMDA.Firstly,based on known microbe-disease associations,we used gaussian interaction kernel profile method to construct microbe similarity network and disease similarity network.After that,we further constructed a heterogeneous network by integrating disease similarity network,microbe similarity network and microbe-disease association network.Secondly,based on the assumption that the importance of different neighbours is different,we re-evaluated the edge weights for each microbe-disease pair by fully integrating network topological characteristics information.As a result,according to different sources of characteristic information,i.e.,microbe similarities and disease similarities,we reconstructed two novel heterogeneous networks.Thirdly,microbe-disease associations were predicted by implementing optimized random walk algorithm on these two heterogeneous networks.Experimental results on cross validations indicated that compared with existing algorithms,NTSHMDA could achieve greater AUC values in identifying potential microbe-disease associations.(2)Considering that most of previous methods developed for microbe-disease prediction depend strongly on known microbe-disease associations to obtain similarity information.Besides,the majority of existing methods are limited in making predictions for new microbes(or diseases)with few or without any known associations.Here,we proposed a novel graph attention network(GAT)-based deep learning model called GATMDA for microbe-disease prediction.Firstly,considering the problem that it is difficult to obtain similarities for new microbes and new diseases based on known associations,we constructed rich microbe features and disease features by incorporating multiple types of biological data.Secondly,to avoid information loss and strengthen representation learning,we optimized standard GAT by propagating information between different self-attentions and then used the optimized GAT to learn representations for nodes.Following that,we design a multi-layer perceptron(MLP)based bi-interaction aggregator to more accurately aggregate the representations of node itself and its neighbours.Thirdly,to identify complex microbe-disease associations,we combine with matrix completion to reconstruct microbe-disease bipartite network and thus predict latent disease-causing microbes according to their predicted scores.Experimental results under three different scenarios showed that GATMDA could achieve better performance than baseline methods,and was suitable for new microbe and disease prediction.(3)Considering the current increasingly serious problems of microbial resistance and the extremely slow development of new drugs,we proposed a graph convolutional network(GCN)-based model named GCNMDA for microbe-drug association prediction.Firstly,we constructed drug similarity network(or features)and microbe similarity network(or features)by taking into account gaussian kernel similarities,drug chemical similarity and microbe functional similarity.Following that,we further implemented random walk algorithms on two networks to extract more valuable features.Secondly,as Conditional Random Field(CRF)has powerful capability in identifying similar nodes,we added a CRF layer into the standard GCN model so that similar nodes have similar representations.Besides,in CRF layer,we adopted attention mechanism to enhance the representation aggregation of more important neighbours.Finally,based on the learned node representations,we predicted microbe-drug associations by reconstructing microbe-drug interaction network.The experimental results on three datasets with different densities indicated that compared with baseline methods,GCNMDA consistently performs better on three different datasets.In addition,it was found that GCNMDA was relatively robust against different datasets.(4)While GCNMDA shows good prediction results on different datasets,there is still room to further enhance the prediction accuracy by fully exploiting prior biomedical data.In addition,GCNMDA cannot make predictions for all new microbes and new diseases.To deal with the above issues,we developed a novel ensembling graph attention network-based deep learning model named EGATMDA for microbe-drug prediction.First,we used microbial genome sequence data to construct microbe features,and constructed drug features by combining drug chemical structure information and drug gaussian kernel similarity.Meanwhile,we constructed multiple heterogeneous networks by incorporating multiple sources of biomedical data.Second,considering that for a given node,different neighbours in the same network can play different important roles and distinct semantic information related to it can be contained in different graphs,we designed dual-attention mechanism to learn representations for nodes.Third,based on the learned node representations,we predicted potential microbe-drug associations by reconstructing microbe-drug bipartite network.The experimental results showed that compared with state-of-the-art methods,EGATMDA could achieve greater AUC and AUPR values.The results under the scenarios for new microbes and diseases demonstrated EGATMDA could be successfully applied for new microbes and diseases.In summary,human microbes play a critical role in human health and disease.This project aims to combine graph-based machine learning with network science technology to model microbe-related problems based on a large number of biomedical data.Here we proposed multiple computational models for microbe network analysis,such as microbedisease prediction and microbe-drug prediction.These models are of great significance for providing deep insight understanding of the pathogenic mechanism of microbes,accelerating the development of new drugs,and promoting the application of microbes in the fields of personalized therapy and precision medicine.
Keywords/Search Tags:Biological Networks, Graph Convolutional Neural Network, Graph Attention Neural Network, Disease-causing Microbes, Microbe-Drug Associations
PDF Full Text Request
Related items