Font Size: a A A

Predicting Disease-causative Genes Based On Amino Acids Usage And Gene Functions

Posted on:2009-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2120360278963897Subject:Bio-IT
Abstract/Summary:PDF Full Text Request
Finding morbid genes plays an important role in studying pathogenesis, developing new methods of diagnosis and treatment. How to evaluate the risk of being a disease gene for the hundreds of candidate genes located in the chromosome region, which is located by such approach as linkage analysis or genome-wide association, is one of the key problems of morbid gene finding. However, all the approaches of this kind reported recently have their defects which make it not available to this problem perfectly. Methods based on sequence features usually take the overall differences between disease and non-disease disease gene into account, and don't distinguish the specific characteristics among different diseases; methods using GO functions can only predict genes which have the exactly the same function as the morbid gene, and treat all functions equally.OMIM diseases that have at least 2 morbid genes are analyzed. The results show that the disease genes responsible for the same disease often use the amino acids uniquely, which means that the amino acids usage is similar between genes of the same disease but remarkably different from other genes. Leave-one-out cross validation is performed for 60 diseases including Ehlers-Danlos syndrome, which have p-value less than 0.1 and known disease genes at least 2, to evaluate the performance of the method based on the disease-specific amino acid usage characteristic. And this method is testified to pick 16% of the morbid genes as the top rank candidate genes out of hundreds of candidate genes in the locus effectively, which is more efficient than PROSPECTR.In addition, this paper also proposed a disease gene prediction tool CDGMiner, based on the neareast pathway between GO terms in DAG., and weight GO terms according their contribution to the disease. And the validation results show that CDGMiner is able to rank 57% morbid genes in top 5% of the candidates. A test set, containing 80 cancer's 292 genes, is used to test CDGMiner's performance on cancers. It turns out to be similar to the cancer gene prediting tool, CGP.
Keywords/Search Tags:Bioinformatics, Morbid gene-prediction, Sequence properties, gene function
PDF Full Text Request
Related items