Font Size: a A A

Research On Random Walk Algorithm With Restarting Operator For Gene-phenotype Heterogeneous Networks

Posted on:2019-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:F D TanFull Text:PDF
GTID:2428330548987412Subject:Engineering
Abstract/Summary:PDF Full Text Request
Heterogeneous information networks contain richer semantic information in nodes and edges.Heterogeneous network-based data mining may find the associations and hidden relationships that traditional data mining methods cannot discover easily.Therefore,researchers pay more attention to it in recent years.This thesis constructs a heterogeneous information network using protein interaction data,disease-gene relationship data and phenotypic similarity data.Then,the random walk algorithm is used to prioritize the candidate genes on the heterogeneous network to identify disease-causing genes.The thesis first analyzes the classical RWRH model in detail,including its data set,the structure of the state transition matrix and the whole process of using random walk to rank phenotypes and genes at the same time.The RWRH model is widely used in the exploration of protein functions,drug target relationships,and RNA disease relationships.To predict the disease-causing genes,many algorithms basically involved in the RWRH model are proposed to change or improve the data sources or the structure of heterogeneous networks.Based on the RWRH model,the thesis proposes the RWRHESER model with extended restart operator and extended seed vector.The breadth-first search?k times?for the initial seed vector p0 forms the extended seed vector set PE,and the restart operator is also extended accordingly.The extended restart operator Pe?k? is added to the random walk iteration formula.The proposed RWRHESER model effectively avoids the impact of initial seed node selection on the performance of the algorithm.During the ranking process of candidate genes,the impact of the adjacent structural data information of diseased gene nodes and disease phenotypic nodes in heterogeneous networks are strengthened.The LapRWRH algorithm is one of the methods to improve the performance of the RWRH model on the problem of predicting disease-causing genes.The thesis applies the RWRHESER model to the LapRWRH algorithm and proposes the LapRWRH-ESER algorithm.This thesis constructs the heterogeneous information networks by connecting the open source protein interaction network from HPRD database and the phenotype similarity network from MimMiner using gene-phenotype relationship network from OMIM database and compares the performance of the proposed RWRHESER model and LapRWRH-ESER algorithm with the classical RWRH model and LapRWRH algorithm.Using the leave-one-out cross-validation to evaluate the ability of finding the gene-phenotype relationship,the results show that the RWRHERSR model and LapRWRH-ESER algorithm have more successfully predictive numbers of disease genes.
Keywords/Search Tags:Heterogeneous Network, Random Walk, Pathogenic Gene, Disease Phenotype
PDF Full Text Request
Related items