Genome Annotation Using Data Mining Techniques

Posted on:2011-05-27

Degree:M.C.S

Type:Thesis

University:University of New Brunswick (Canada)

Candidate:Zhang, En

Full Text:PDF

GTID:2448390002461777

Subject:Biology

Abstract/Summary:

The annotation of large volumes of genomic data by human curators requires huge amounts of time and resources, so the use of data-mining techniques in the field of genome annotation has become prevalent. The selection of appropriate data-mining techniques is as vital as the annotation itself. Comparing data-mining techniques using specific genomic data is necessary to determine how best to annotate. In these experiments, a collection of protein sequences of yeast, E. coli and Arabidopsis are analyzed by several data-mining techniques. These experiments enable the discovery of characteristics of both protein sequences and data-mining techniques through the annotation process. WEKA, an open source software consisting of popular data-mining techniques, is used to learn the annotation of protein sequences. The characteristics of both protein sequences and data-mining techniques are determined from the annotation results, through comparing the performance of selected data-mining techniques as well as analyzing the protein sequences.

Keywords/Search Tags:

Annotation, Techniques, Data, Protein sequences

Related items

1	Designing A Protein Annotation System Based On Local Nr Database
2	Protein surface analysis by Dimension Reduction with applications in functional annotation and drug target prediction
3	Multiple alignments of protein structures and their application to sequence annotation with hidden Markov models
4	Research On Key Techniques Of Protein-protein Interaction Extraction
5	Data Mining And Its Application On Protein Structure And Function Prediction
6	Research On Algorithms For Identifying Protein Complexes Based On Protein Network
7	Protein-protein Interaction Network Inference From Mass Spectrometry Data
8	ECPF:An Efficient Algorithm For Expanding Clustered Protein Families
9	The Research Of Protein-Protein Extraction In Biomedical Literature
10	Research On Genic Function By Clustering On Protein Network