The annotation of large volumes of genomic data by human curators requires huge amounts of time and resources, so the use of data-mining techniques in the field of genome annotation has become prevalent. The selection of appropriate data-mining techniques is as vital as the annotation itself. Comparing data-mining techniques using specific genomic data is necessary to determine how best to annotate. In these experiments, a collection of protein sequences of yeast, E. coli and Arabidopsis are analyzed by several data-mining techniques. These experiments enable the discovery of characteristics of both protein sequences and data-mining techniques through the annotation process. WEKA, an open source software consisting of popular data-mining techniques, is used to learn the annotation of protein sequences. The characteristics of both protein sequences and data-mining techniques are determined from the annotation results, through comparing the performance of selected data-mining techniques as well as analyzing the protein sequences. |