Font Size: a A A

Research On Pathogenic Gene Prediction Algorithm Based On Heterogeneous Information Fusion

Posted on:2020-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:X P WangFull Text:PDF
GTID:2370330590973268Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Complex diseases seriously affect people's physical and mental health.The discovery of disease-causing genes has been the research target.With the emergence of bioinformatics and the rapid development of biotechnology,in order to overcome the inherent difficulties of long experimental period and high cost of traditional biomedical methods,researchers have proposed many gene prioritization algorithms that utilize a large number of biological data to mine pathogenic genes.However,because the currently known gene-disease association matrix is still very sparse and lacks irrelevant evidence between genes and diseases,it has a certain impact on the prediction performance of the gene prioritization algorithm.Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes,this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion(PUIMCHIF)to predict candidate pathogenic genes of human diseases.On the one hand,The PUIMCHIF algorithm uses different compact feature learning methods to extract features of genes and diseases from multiple data sources,making up for the lack of sparse data.Specifically,random walk with restart(RWR)and diffusion component analysis(DCA)are used to learn the low-dimensional network features of genes and diseases.High-dimensional data features of genes and diseases are reduced by denoising autoencoder(DAE).On the other hand,based on the prior knowledge that most of the unknown gene-disease associations are unrelated,we use the PU-Learning strategy to treat the unknown associations as negative examples for biased learning.We conducted several experiments to verify the validity of the PUIMCHIF algorithm.The experimental results of PUIMCHIF are significantly better than other algorithms in the three evaluation indexes of accuracy,recall and mean percentile ranking(MPR).In the top-100 global prediction analysis of multi-gene and multi-disease,the probability of recovering true gene association of PUIMCHIF can reach 50%,and MPR value is 10.94%,which is verified by experiments with higher priority than other methods such as IMC and CATAPULT.
Keywords/Search Tags:pathogenic gene prediction, heterogeneous information fusion, compact feature learning, PU-Learning
PDF Full Text Request
Related items