Font Size: a A A

Discovering functional annotation through data mining of large scale phenomics in Arabidopsis thaliana

Posted on:2013-07-01Degree:Ph.DType:Dissertation
University:Michigan State UniversityCandidate:Bell, Shannon MarieFull Text:PDF
GTID:1458390008976960Subject:Chemistry
Abstract/Summary:
To address society's biotechnology needs in agriculture, medicine, and beyond, a better understanding of the flow of information from gene to protein to phenotype is needed. However, despite the increasing amount of genome-scale (omic) data, the lack of annotation providing insight on gene function remains a challenge for researchers. The lack of functional annotation can hinder progress from targeted metabolic engineering to foundational biological research. Vague annotations coming from an expression profile or sequence similarity make it hard to design experiments to characterize the gene and can lead researchers down the wrong path. Using large scale phenomics will provide more useful information to help guide researchers in the characterization of under annotated genes. Unfortunately, many of the tools needed to carry out analyses of large scale phenotypic data are lacking.;This work presents a suite of software tools developed to address this need. MIPHENO introduces a workflow to enable the post hoc analysis of screening data from quality control to normalization to prediction of individuals likely to show a response. The NetComp suite features an algorithm, SimMeasure, to calculate the similarity between individuals in the presence of missing data. SimMeasure also works with datasets that have been thresholded to remove values under/above a given response value. It also features several additional functions aimed at data integration and network comparisons.;Results of these methods applied to a large phenotypic screen of gene disruption lines in Arabidopsis thaliana demonstrate the utility of these tools in the analysis of large-scale datasets. They show that phenotypic data can be successfully used in an analogous manner to other high throughput data to build models of gene function. This may be the first example of using high throughput phenotypic data in higher organisms to build models for functional annotation. Together this work presents the next step in the analysis of omics data and moves the field closer to improving annotation quality.
Keywords/Search Tags:Data, Annotation, Large scale, Gene
Related items