Font Size: a A A

Enhanced Inductive Matrix Completion For Gene-Disease Associations Prediction

Posted on:2020-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y PuFull Text:PDF
GTID:2404330590995505Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technology and computer technology,a variety of biological data is growing explosively.Faced with the massive biological information,how to efficiently explore gene-disease associations is of great significance in the current biomedical field.With accurate prediction,researchers can understand the functions of pathogenic genes so that they are able to assist people in preventing and curing diseases.Researchers at home and abroad have proposed many prediction algorithms for this problem and made some progress.However,most existing methods ignore some inherent prior knowledge of the biological information.Meanwhile,the existing biological databases is not perfect due to many factors,which leads to the sparsity and skew of gene-disease associations.Thus,researchers often encounter the sparsity and PU(Positive and Unlabeled)learning problems when designing methods.To address these challenges,this paper introduces Matrix Completion(MC)theory and model the gene-disease associations prediction as Inductive Matrix Completion(IMC).Then,proposing a novel enhanced inductive matrix completion model from two different perspectives.First,a method called Enhanced Inductive Matrix Completion with Prior Knowledge(EIMC_PK)is proposed,it not only uses the sparse regularization to preserve the prior sparsity of gene-disease associations,but also employs the manifold regularization to capture the correlation consistency of genes and diseases.The experimental results show that EIMC_PK outperforms other state-of-the-art methods.Besides,proposing a method called Enhanced Inductive Matrix Completion with Katz(EIMC_Katz)to predict gene-disease associations.It first exploits Katz method to estimate gene-disease association based on gene-disease heterogeneous networks,this step can alleviate the effect caused by the sparsity of genedisease associations and PU problem.However,subject to the quality of the similarity network,the Katz method inevitably introduces some noise.Then,to address the challenge,introducing the elasticnet regularization into IMC to enhance robustness and improve the prediction of gene-disease association.The extensive experimental results show that EIMC_Katz outperforms the mainstream prediction algorithms.Finally,the above two methods can solve the cold start problem by integrates the features of genes and diseases.
Keywords/Search Tags:Gene-Disease Associations Prediction, Matrix Completion, Manifold Regularization, Heterogeneous Information Networks, Inductive Learning
PDF Full Text Request
Related items