Font Size: a A A

The Research Of Drug-Gene Relationship Prediction Algorithm On Multiple Layers Heterogeneous Biological Information Network

Posted on:2019-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:J X BingFull Text:PDF
GTID:2404330545997766Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Along with the development of big data technology,the amount of data has been larger and larger,types of data also have become increasingly much richer.The heterogeneous network composed of various factors can be widely applied in many domains,such as literature database,social network,recommendation system and biological information network.Both of the database and system are heterogeneous information networks which combined multiple types of data.What we should focus on is to filter available data among the whole big dataset and dig out the helpful information,which can achieve the value of big data.Therefore,the tasks of data mining for heterogeneous information networks are extremely hot and challenging research directions now.Among them,the application of biological heterogeneous information network is especially significant,the heterogeneous network made up by drug nodes and human gene nodes can reveal the interactions between these two kind of biological entities,studying the relevance as well as predicting unknown correlations in this network can not only effectively illustrate how drugs influence human gene expressions,but estimate pesticide effects by detecting the disturbance level of gene expressions.For this purpose,this dissertation focuses on the research of prediction algorithms for the biological information heterogeneous network consisted of drugs and genes and makes a comparison of performance between novel and conventional methods.Specifically,we have done the following three tasks and innovations:(1)Based on the drug-gene network,we added side effects data to expand the relationships between drugs and side effects.Creatively using representation learning method to capture the structure features of the drug-gene heterogeneous network.We used two representation learning algorithms to embed the structure and semantic features of the network into the representation vector of each node,the predicting part was performed by the kernel Bayesian matrix factorization algorithm.Finally,in order to verify the effectiveness of the representation learning model for feature extraction,we compared the prediction results of the two representation learning models with the results of existing algorithms on real data.(2)Using the inductive matrix completion algorithm on the drug-gene network.On the basis of the traditional matrix factorization method,we used a linear low-rank matrix model instead of the original potential relationship matrix in the matrix decomposition part,which makes up for the defect that conventional algorithms cannot predict the relationships of a new node,i.e.a node without any observed relevance.The inductive matrix completion method is carried out on the biological heterogeneous information network composed of the real data of drugs,genes and side effects.To verify the predicting effect of the algorithm,we also compared the prediction results with existing link prediction algorithms.(3)Optimized the meta-path based algorithm.We analyzed the shortcomings of the meta-path structure which cannot effectively capture the potential structural relevance in a heterogeneous network.On this basis,we introduced the concept of meta graph and applied this structure to the link prediction problem of the biological heterogeneous information network.We used a random walk algorithm based on meta graph to obtain useful meta structures from our drug-gene-ADR network and get the latent features of drugs and genes by using meta-graph based matrix factorization method.Then we treat the latent features as inputs to the factorization machine to predict the interaction between drugs and genes.At last,we compared the results with three common link prediction algorithms.
Keywords/Search Tags:Link Prediction, Representation learning, Meta graph
PDF Full Text Request
Related items