Font Size: a A A

The Research Of Prioritizing Disease Candidate Genes Based On Heterogeneous Network

Posted on:2016-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:N XiongFull Text:PDF
GTID:2370330473464867Subject:Software engineering
Abstract/Summary:PDF Full Text Request
An important and emerging topic in the current system of biology is to illustrate the relationship between gene and disease of human genetic diseases.With the development and application of high throughput technology,the gene and protein interaction data has gone gigantic.Using the information technology method,the analysis and extraction of useful data from these massive information,is of a great value to the genetic basis and molecular predictors of disease genes and to the work of deciphering the human genetic diseases,has a vital significance to the entire genome.Researches have shown that,in the same or similar tendency of disease genes in the interaction of the gene network is the modular nature of human genetic diseases.As long as the protein is encoded by a gene,the protein interaction networks can be mapped to the gene networks.Based on this characteristic,currently there are many approaches of heterogeneous networks used to predict disease genes that can be constructed via protein interaction network,similarity phenotype network and phenotype-gene relatioships,but they bear some limitations.At present,the method based on heterogeneous network is a random walk algorithm called RWRH,which is based on graph theory.RWRH has a concept of taking the known disease genes as seed nodes to random walk,and after the score of gene reaches the steady state,sorting of the candidate disease is applied.However,due to incomplete protein interaction data and the presence of noise,the algorithm has some limitations.Naturally,as in biological data resources,a large number of databases through the text description of genes are involved in the biological process and molecular functions that forms the semantic information of gene.If the candidate gene and the disease gene is very similar in semantics,we can increase the semantic similarity aspects to make up for the weak correlation between the proteins.Researches show that the GO annotation information is a very effective semantic resources used for pathogenic gene.Therefore,based on the two kinds of data resources: protein-protein interaction data and the GO annotation information,this paper proposes a new method to predict the relationship between genes and phenotypes based on the GO annotation information to optimize the protein interaction network.By comparing our method with other methods,our results showed that the prediction efficiency of the algorithm is greatly improved.In addition,there are some improved methods for the RWRH algorithm,mostly implemented through a combination of semantic similarity,gene sequence similarity,similarity of biological pathways to optimize the protein interaction network,but only few contributed to handle the topological defects of PPI network.Finally we try to use a statistical adjustment method along with the RWRH algorithm to sort the results after correction.Our experimental result shows that the prediction efficiency improved significantly.In summary,we proposed RWRH-GO and RWRHD algorithms,is to explore and discuss the shortcomings of RWRH algorithm based on the heterogeneous network.These two algorithms represent two methods,one is combined with other the data of biological data,such as GO semantic similarity,to optimize PPI data,thereby increasing the reliability of the heterogeneous network;the other is our concern topological defects exist in the heterogeneous network itself for gene prediction,we targeted the data were later modified.Finally,we hope our research results for genes in heterogeneous networks can provide reference for disease prediction based on.
Keywords/Search Tags:Heterogeneous network, Random walk, Semantic similarity, Protein interaction network, Degree adjusted
PDF Full Text Request
Related items