Font Size: a A A

Random forests and gene selection to classify Arabidopsis thaliana ecotypes

Posted on:2008-02-03Degree:M.SType:Thesis
University:Michigan State UniversityCandidate:Yeh, Hsueh-hanFull Text:PDF
GTID:2440390005455005Subject:Statistics
Abstract/Summary:
This thesis discusses the classification and gene selection of ecotype data for Arabidopsis thaliana. Gene expressions from Oligonucleotide gene expression arrays were used to classify Arabidopsis thaliana ecotypes using statistical methods. The hierarchical cluster method was used to group ecotypes according to latitude and altitude to distinguish ecotypes. Limma was used to select differentially expressed genes. The Random Forest algorithm provides a ranking of genes to indicate how well they can discriminate between ecotypes.;We focus on the Random Forest algorithm. It is an efficient approach and can deal with a large number of predictor variables in a classification process. Parameters are optimal to achieve a small classification error rate.;The final selection of genes may play an important role in adaptation to stress conditions. They were further examined for gene function and evidence regarding stress resistance.;Keywords: Arabidopsis thaliana, Microarray Data, Hierarchical Cluster, Limma, Random Forest, Classification.
Keywords/Search Tags:Arabidopsis thaliana, Random forest, Gene selection, Classification, Hierarchical cluster
Related items