Font Size: a A A

Research On Algorithms For Construction And Refining Of Functional Similarity Network Based On Gene Ontology

Posted on:2018-03-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z TianFull Text:PDF
GTID:1360330566498808Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Gene Ontology(GO)is mainly used to describe the properties of genes and gene products,which includes cellular component,biological processes and molecular function.Comparing the functional similarity of genes based on GO is very meaningful for the gene functional analysis and prediction problem.Based on the idea of data fusion,it is a hot topic to prioritize diseases genes of human with various types of biological data.This dissertation starts from GO and mainly focuses on the problems of gene functional similarity calculation,acceleration algorithm for gene functional similarity calculation,construction and refining of human gene functional similarity network as well as the identification of human disease genes.The main contents include the following four aspects:(1)A novel method for measuring the functional similarity of genes based on weighted inherited semantics is proposedIn terms of gene functional similarity calculation,this thesis presents a novel method for measuring the functional similarity between genes based on weighted inherited semantics.Firstly,since current methods cannot represent the specificity of terms when they compute the information content of terms,we redefine a novel model for computing the semantic information content of terms which can fully excavate the structural information of terms in GO.Secondly,since current methods neglect the semantic overlap between terms in a term set,they have error prone when they measure the semantic information content of a term set.Therefore,we propose the concept of weighted inherited semantics.The semantics of a term is divided into two parts,which are called weighted inherited semantics and extended semantics,respectively.Then the semantic information content of a term set is computed based on weighted inherited semantics and extended semantics of terms.Finally,the functional similarity between two genes is measured based on semantic overlap ratio between their annotated gene sets.The experimental results show that the proposed algorithm is outstanding in many groups of experiments,which indicates that the proposed method can measure the functional similarity of genes more accurately.(2)A novel method for speeding gene functional similarity calculation based on hash strategy is proposedWith the gradual deepening research of GO and the progress of biological experiment technology,it has become a research hotspot that computing the gene functional similarity efficiently on a large scale.To overcome the problem of low efficiency for current gene functional similarity calculation methods and related tools,we proposed a novel strategy to speed the gene functional similarity calculation by means of utilizing ‘space' for ‘time' idea in computer science.The algorithm mainly contains two steps: Firstly,the information storage form is transformed from the GO structure(directed acyclic graph)to the hash structure.Then,the functional similarity between genes is quickly calculated based on the constructed hash tables.With the aid of the established hash tables,gene functional similarity calculation methods can get their necessary information from the hash tables directly,so that it is able to avoid traversing the GO structure repeatedly.Time complexity analysis shows that the proposed strategy can significantly improve the efficiency of current gene functional similarity calculation methods.Compared with other typical methods,the experimental results show that the proposed method has a greater advantage in calculating the functional similarity on the whole genome.(3)A novel method is proposed for refining the gene functional similarity network based on a referenced networkIn the post-genomics era,one important task is to study the interrelationships among molecules from the network level.In recent years,gene functional similarity networks as well as other traditional molecular networks have received more and more attentions.However,since the gene functional similarity network is a fully connected network,which has noise.Therefore,we propose a novel method to refine the gene functional similarity network based on a reference d network.The proposed method can be mainly divided into three steps.First,it employs the results of different calculation methods to build an integrated gene functional similarity network,which can overcome the shortcomings of single gene functional similarity calculation method.Then,it builds a high quality referenced gene association network based on different human protein interaction networks by the way of fully mining the topological similarity of genes in the network.Finally,the integrated gene functional similarity is refined based on the referenced gene association network.The experimental results show that the refined gene functional similarity network is consistent with other typical molecular interaction networks in terms of their topological features and node degree distributions,which indicates that the refined network has the characteristics of biological networks.The good performance of RGFSN on protein complex prediction experiments could exhibit its rationality and effectiveness further.(4)A novel method is proposed for prioritizing disease genes based on gene similarity networkMining human disease genes can help prevent and treat disease s.At present,prioritizing disease genes based on molecular interaction networks has become a research hotspot.However,the molecular interaction network has the shortcomings of high false positive and low coverage,which leads to the experimental results needing to be further improved.Therefore,based on the thought of data fusion,we employ a variety of gene biological data for the construction of gene similarity networks to expand the mining cope for human disease genes.Specifically,firstly different gene similarity networks are constructed by applying GOA data of genes,sequence data and domain data of proteins,respectively.Then,these gene similarity networks are fused by employing similarity network fusion method.After that,a phenotype-gene bilayer network is constructed which combines with the phenotype similarity network,phenotype-gene association network and the integrated gene similarity network.Finally,we apply the random walk with restart algorithm to prioritize disease genes for human.Compared with other concerned methods,the experiment results show that the proposed method can expand the search scope of disease genes and improve the accuracy of disease genes prediction method.
Keywords/Search Tags:Gene Ontology, Gene functional similarity, Functional similarity network, Network refining, Disease genes, Random walk
PDF Full Text Request
Related items