Font Size: a A A

Comparison and evaluation of statistical-learning methods for gene function prediction in Arabidopsis thaliana

Posted on:2006-05-16Degree:M.ScType:Thesis
University:University of Toronto (Canada)Candidate:Lan, HuiFull Text:PDF
GTID:2450390008460165Subject:Computer Science
Abstract/Summary:
Approximately 30,000 genes have been discovered by genome sequencing in Arabidopsis thaliana completed in 2000. However, about half of these genes have not been assigned any function yet. The goal of this study is to identify unknown genes that are potentially involved in plant responses to stresses. We evaluated and compared five basic statistical learning methods for gene function prediction on a genome-wide scale using gene expression data. None of these methods was uniformly better than the others. In addition, we investigated combining these methods for prediction. The combined method achieved better classification performance than the basic methods for the top "response to stress" function. With precision above 50%, we identified a considerable number of unknown genes that are potentially stress-associated, which are currently being validated by biologists.
Keywords/Search Tags:Gene, Methods, Function, Prediction
Related items