Font Size: a A A

Prediction Of Disease Genes Based On Function Information

Posted on:2009-12-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:F YuanFull Text:PDF
GTID:1114330371480850Subject:Bio-IT
Abstract/Summary:PDF Full Text Request
Human diseases are commonly associated with both the environmental factors and genetic factors. Identifying genes that are likely to be involved in human genetic disease is of vital importance for both understanding disease pathogenic mechanism and improving clinical practice, and is the main aim in the post-genomic era. Hundreds of candidate genes are located in the chromosome region by linkage analysis or genome-wide association. It is a important problem of disease gene identification to evaluate a candidate gene to be the cause of a disease or not. During recent years, some bioinformatics application softwares or online tools for prioritizing candidate genes have been developed and released to public. Research shows that GO annotation is the most effective data resource among the various of gene information. There is still a big space to improve the performance of GO annotation-based methods, for that the GO structure information is not fully exploited.In this thesis, a new disease gene prediction method based on GO annotation is proposed. It allows users to score and prioritize the candidate genes by using a new functional similarity algorithm, which is based on the shortest distance between GO terms in DAG. And the validation test results show that the target gene was in the top10%with a68.5%chance. The method shows promising performance compared to other existing tools.Like many existing methods for disease gene prediction, the above method suffers from annotation bias as they can not deal with diseases lacking known causative genes. We have developed a new computational method. It prioritizes genes on a chromosomal region according to their possible relation to a genetic disease lacking known causative genes, by using a combination of text mining and gene-function similarity analysis. Firstly, it mines the disease-related GO terms from the MEDLINE/PubMed database and GOA database. Then, it prioritizes the candidate genes by the functional similarity algorithm of the above method. Being a complementary method, the method can handle diseases lacking detailed GO annotation, which are ignored by the above method, as well as many other existing methods.The existing disease genes prediction Strategies, including the above methods we have developed, all focus on the gene information. But, few attempts have been made to systematically analye the relationships at the phenotype level. In this thesis, we formalized1572disease phenotypes'description contained in the OMIM database by mining MeSH C term from medical literature, then evaluated the similarity between phenotypes based on MeSH C tree. We find that these similarities are positively correlated with a number of measures of gene function. Obviously, the highly similar phenotypes may be used to prediction candidate gene for diseases.In many case, we just get the disease phenotype and the region of interest, don't know if the disease have known causative genes or related GO annotation. For these case, we developed a new web tool, CDGMiner, which is combined with the above two methods, and just is added a judgement module. It can handle both diseases with known related GO annotation and diseases lacking known causative genes and/or diseases with known genes but lacking sufficient function annotations.
Keywords/Search Tags:Bioinformatics, Disease Gene, Gene Ontology, Text Mining, Candiate Gene, Similarity, Phenotype
PDF Full Text Request
Related items