Font Size: a A A

The Research Of Disease-Causative Genes Prediction Based On Heterogeneous Network

Posted on:2019-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y HaoFull Text:PDF
GTID:2404330563958533Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the completion of Human Genome Project,humans have entered "post-genome era." This project gives people a new understanding of the disease,which is that the occurrence of many diseases is related to genes.Therefore,searching for disease-causing genes has become an important issue in recognize the mechanisms of disease,the methods of gene diagnosis,the prevention and treatment of disease.At the same time,the study of disease-causing genes plays an important role in improving medical care,extending the survival cycle of patients and discovering drug targets.The prediction of disease-causing genes based on computation has become the main content of bio informatics research in the present and the future.Due to the lack of phenotype data and the influence of complex network structure,the improvement in the performance of forecasting methods is quite necessary in the medical field.In order to solve these two problems,the main work to be carried out in this study is as follow:For the lack of phenotype data issue,this paper reconstructs the phenotype correlation network.First,the existing phenotype-related data sets are processed to obtain partial phenotype correlations.Then,natural language processing techniques is applied to get new phenotype-associated data,and the extended phenotype data constitutes a phenotype network.The application of network embedding algorithms maps the phenotype nodes into vectors,and the phenotype correlation network is reconstructed by using the cosine distance between vectors.In order to solve the problem of complex network structure information issues,this paper proposes an optional disease-causing gene prediction method framework based on network embedding algorithm,and the framework include three steps:(1)Combining extended phenotypic association data,protein-protein interaction data,and protein-phenotype associations to construct a protein-phenotype heterogeneous network.(2)Generating network node vectors based on the optional network embedding algorithm(Deepwalk or Node2Vec).(3)Calculating the cosine similarity between the phenotype node vector and the gene node vector,and the first few genes with higher similarity ranking are predicted to be the causative genes.Experimental results show that this research increases the number of phenotype nodes.Compare with other existing methods,it has been proved that the proposed method for predicting disease-causing genes improves the performance of disease-causing genes prediction by leave-one-out cross-validation.
Keywords/Search Tags:Disease-Causative Genes Prediction, Phenotype-Gene Association Network, Network Embedding
PDF Full Text Request
Related items