Font Size: a A A

Research On Term Similarity Computing And Extension In Gene Ontology

Posted on:2016-04-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J PengFull Text:PDF
GTID:1108330503969671Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Gene Ontology(GO) is mainly used to describe the attributes for gene and gene products, including three categories molecular function, biological process and cellular component. GO based term similarity and term extension is of great benefit to gene function analysis, comparison and prediction. The existing similarity measurements have limited functions for only a subset of GO information is considered and some information is missing in GO. Therefore, the current similarity measurements can not measure the similarity between terms and genes. In other hand, a automatic tool for predicting GO term to add more knowled ge to GO is demanding. This dissertation focused on the difficulties and problems in GO based similarity calculation and GO term prediction. The main content includes:First, GO is curated by domain expert with knowledge from experiment result, literature and so on. GO can not include all the known functional information of genes leading to the inaccuracy of term similarity calculation. To solve this problem, we proposed a novel gene ontology term similarity approach called NETSIM(network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. To test the performance of NETSIM, we compared genes based on the term similarities calculated by NETSIM on yeast, Arabidopsis, and human. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities with high rubustness.Second, GO includes three categories: molecular function, biological process and cellular component. Discovering such cross-category associative relationships may help researchers conduct biological reasoning and generate biological hypotheses. Currently, most of the researchers are focus on calculating the similarities in the same category not the similarities between terms in the different root ontology categories. In addition, the existing cross-category similarity measurements only consider the text similarity of term names or the overlap of gene annotations of terms. To solve this problem, we proposed a new cross-category similarity measurement called Cro GO by incorporating genome-specific gene co-function network data. Cro GO used a propagation method to calculate the information content for cross-category terms and solve the level location problem. The performance study on the gold-standard dataset showed that our measurement outperforms the existing algorithms. We also generated genome-specific term association networks for yeast and human. An enrichment based test showed our networks are better than those generated by the other measures.Third, measuring the gene functional similarity based on the term similarity is one of the popular research areas. The measures used the information contained in the GO including annotation, structure and lowest common ancestors and so on. Authough thre are dozens of measures to measure the gene functional similarity based on GO, the existing measures emphasize on only one or few types of relationships between genes but ignores the others. To solve the problem, w proposed a novel integrative gene functional similarity measure called Inte GO2 to automatically select appropriate candidate measures and then to integrate them using a metaheuristic search method. The experiment results show that Inte GO2 significantly improves the performance of gene similarity on both molecular function and biological process GO categories. Furthermore evaluation shows that Inte GO2 has high robustness. We gradually removed the best four candidate measure, the performance of Inte GO2 is also the highest. We add a random measure which gene similarity score is generated randomly, Inte GO2 also has the highest performance.Fourth, GO is curated by the domain expert. However, with rapid increase of the biological knowledge especially the gene function information, a serious problems happened in generation of GO. For example, millions of biological literature, databases and experiment datasets are published every year. The domain experts are hard to translate these data into GO manully. To impro ve the efficiency of GO extension, a bioinformatics tool to extend new GO terms and help the curator to update GO is needed. To solve this problem, we proposed a new algorithm called GOExtender to effectively extend the GO structure by adding new GO terms based on gene network data. We first select the candidate parent terms from GO and predicte the descendants of the selected terms. To evaluate the performance of GOExtender, it is applied to four versions of GO data(20097, 2009, 2011 and 2013) on both biological process and cellular component category. Evaluation tests on biological process and cellular component categories of different GO versions showed that GOExtender is significantly better than the other existing methods. Furthermore, applying GOExtender to the most recent release of GO discovered new GO terms with literature support.
Keywords/Search Tags:gene ontology, gene network, term similarity, gene functional similarity, term extension
PDF Full Text Request
Related items