Font Size: a A A

Genome Annotation Using Data Mining Techniques

Posted on:2011-05-27Degree:M.C.SType:Thesis
University:University of New Brunswick (Canada)Candidate:Zhang, EnFull Text:PDF
GTID:2448390002461777Subject:Biology
Abstract/Summary:
The annotation of large volumes of genomic data by human curators requires huge amounts of time and resources, so the use of data-mining techniques in the field of genome annotation has become prevalent. The selection of appropriate data-mining techniques is as vital as the annotation itself. Comparing data-mining techniques using specific genomic data is necessary to determine how best to annotate. In these experiments, a collection of protein sequences of yeast, E. coli and Arabidopsis are analyzed by several data-mining techniques. These experiments enable the discovery of characteristics of both protein sequences and data-mining techniques through the annotation process. WEKA, an open source software consisting of popular data-mining techniques, is used to learn the annotation of protein sequences. The characteristics of both protein sequences and data-mining techniques are determined from the annotation results, through comparing the performance of selected data-mining techniques as well as analyzing the protein sequences.
Keywords/Search Tags:Annotation, Techniques, Data, Protein sequences
Related items