Font Size: a A A

Analyzing The Codon Usage Feature Of Disease Genes And Predicting Disease-causative Genes

Posted on:2007-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q X ZhouFull Text:PDF
GTID:2144360242461990Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Positional candidate strategy has become the main approach to identify disease-causative genes. How to evaluate the risk of being a disease gene for the hundreds of candidate genes in the chromosome region, which is located by such approach as linkage analysis, is one of the key problems of positional candidate strategy. Effective solution of the problem will remarkably shorten the time and decrease the cost of disease gene identification.As more and more disease genes being identified, it is possible for computational method to predict disease genes by using the knowledge learned from known disease genes. The three approaches of this kind reported recently, the approaches respectively based on gene expression profile, gene function and gene sequence characteristics, can't solve the problem perfectly, because the approaches based on expression profile and GO annotation heavily depend on the integrality and veracity of the available information, while the approaches based on gene sequence features discriminate disease genes from normal genes merely by using the statistic difference of sequence characteristics between all the known disease genes and normal human genes, treating all the diseases equally.To tackle these problems, discovering more disease-specific characteristics of disease genes in sequence level is possibly a new way. Codon usage is selected to perform the characteristic mining as it is known to be related to lots of aspects such as gene expression profile, subcellular location and gene function. A novel method is proposed to extract the codon usage characteristic in disease genes. The results show that the disease genes responsible for the same disease often use the codons uniquely, which means that the codon usage is similar between genes of the same disease but remarkably different from other genes.Based on the disease-specific codon usage characteristic, a novel approach to predict human disease genes is developed. Leave-one-out cross validation is performed for 46 diseases, which have p-value less than 0.1 and known disease genes no less than 3, to evaluate the performance of the proposed method. The result indicates that 15% of the disease genes are truly predicted as the highest priority candidate out of average 89 candidates in their location regions, about 1/3 disease genes rank within top 3 and about 2/3 disease genes rank within top15.Another advantage of this approach is that it is solely based on DNA sequences and therefore has the ability to identify potential disease genes whose functions and expression profile are completely unknown. Furthermore, the approach mines the characteristics for each disease, which is disease-oriented and may have less false positive results.Furthermore, a system is developed to discover new disease related genes for human disorders by integrating the technology of new gene prediction, Gene Ontology prediction and disease gene prediction. Large-scale analysis is performed for cardiomyopathy diseases, and some meaningful results are available via http://infosci.hust.edu.cn.
Keywords/Search Tags:bioinformatics, feature discovery, codon usage, disease gene prediction
PDF Full Text Request
Related items