Font Size: a A A

Network Analysis Methods For Identifying The Molecular Mechanisms Of Disease Phenotypes

Posted on:2021-05-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:K YangFull Text:PDF
GTID:1364330614972262Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Associating disease phenotypes to genotypes is one of the core researches in modern biomedicine.Identifying the causing genes associated with disease phenotypes and investigating the function of these genes in the pathogenesis of disease is a major goal of biomedical research.Traditional disease gene identification methods,e.g.,linkage mapping and genome-wide association,are mainly considered from the perspective of biological experiments.But the entire processes of experimental analysis often take a lot of labor and time.With the completion of the whole human genome project and the maturity of high-throughput sequencing technology,the massive biomedical experimental data has been accumulated.Utilizing computation-based methods based on the phenotypes and genotypes related data to predict disease genes has proved an efficient approach.In recent years,related researchers have proposed a large number of computation-based prediction methods with high prediction performance for disease genes.However,there are still some problems and challenges:(1)the prediction methods based on network propagation is susceptible to the core nodes in the networks,which may lead to the prediction results biased to the core genes in those networks.(2)For the prediction methods based on embedding representation,how to make full use of the existing heterogeneous associations to construct the context information of diseases and genes is still a challenge.Exploring the impact of different heterogeneous information on the performance of prediction algorithms is also a problem to be studied.(3)Most current prediction methods are still affected by the incompleteness of the interaction omics data and the research bias of the current phenotype-genotype associations.(4)Due to the sparsity and complexity of medical association data,designing a deep neural network model based on the multi-view features of diseases and genes to predict causing genes remains a challenge.(5)There is still a lack of a reliable symptom-gene association dataset for the study of symptom gene prediction.Therefore,in this paper,we propose three novel disease gene prediction methods,and apply the model based on heterogeneous network representation to the task of symptom gene prediction.Our study mainly includes the following contents.(1)It is difficult to construct appropriate context information for diseases and genes by integrating phenotypic and genotype-related heterogeneous associations in prediction methods based on embedding representations.Therefore,we propose a heterogeneous network embedding representation framework for disease gene prediction(Her Ge Pred).Based on this framework,we propose two specific disease gene prediction algorithms,namely prediction algorithm based on the similarity of the low-dimensional vectors(LVRSim)and prediction algorithm with the fusion of network representation and network propagation(RW-RDGN).The analysis results of disease interaction and the recovery experiments of disease-gene associations showed that the low-dimensional vector features of diseases and genes have good property of fusing the structural information of heterogeneous network.The RW-RDGN method uses the low-dimensional vector feature’s similarity of diseases and genes to reconstruct disease-gene heterogeneous network and utilizes a random walk algorithm to predict candidate genes.The experimental results showed that the RW-RDGN method obtained better performance on disease gene prediction than network propagation-based prediction methods.(2)Aiming to the problem that the network propagation-based prediction methods are susceptible to the incompleteness of the interaction omics data and the research bias of the current phenotype-genotype associations,we propose a disease gene prediction method that fused the correlation of functional module and network closeness(Map Gene).This method leverages non-negative matrix factorization to incorporate the correlations derived from phenotype-genotype associations and interactome network in order to obtain the functional modules related to diseases and genes.Meanwhile,Map Gene utilizes the shortest path lengths in interactome network to obtain the network closeness correlation between diseases and genes.The experimental results show that the Map Gene method performs significantly better on disease gene prediction than current baseline methods.The analysis results of the predicted candidate genes show that the Map Gene method can effectively alleviate the tendency of identifying the core genes in the protein-protein network.Furthermore,the functional module information can help us understand the mechanism of Map Gene to identify candidate genes and obtain reliable candidate genes.(3)Aiming at the problem that it is difficult to apply the deep neural network to disease gene prediction due to the sparsity and complexity of biomedical data,we propose a deep neural network model that fuses the multi-view features of diseases and genes to predict disease genes(Deep GN).The Deep GN model can effectively integrate the multi-view features of diseases and genes,and construct the set of the positive and negative samples as the supervised information to optimize the parameters of the neural network and the deep features of diseases and genes.The experimental results indicate that our method obtained a significant improvement in prediction performance compared to the baseline methods.In addition,we conducted in-depth functional homogeneity and interaction analysis for candidate genes,the results of which indicate that the candidate genes and the known genes of the diseases have high functional homogeneity and close gene interactions.(4)In response to the current lack of reliable symptom-gene associations,we constructed a high-quality dataset of symptom-gene associations and a knowledge base(Sym Map)that links the symptoms from Chinese and Western medicine.Meanwhile,we apply the prediction methods based on heterogeneous network representation to the symptom gene prediction,and propose a symptom-related heterogeneous network embedding method(LSGER)to predict symptom genes.Experimental results show that the LSGER method achieves a significant improvement in prediction performance compared to the baseline methods.The data set of symptom-gene association we constructed and the rigorously screened prediction set of symptom-gene association are useful for promoting the study of symptom gene prediction methods and symptom molecular mechanisms.
Keywords/Search Tags:Gene prediction for disease phenotypes, network embedding representation methods, network propagation methods, deep neural network, disease molecular mechanism analysis
PDF Full Text Request
Related items