Font Size: a A A

Nearest neighbour classification improves side effect machine performance for sequence categorization

Posted on:2011-07-22Degree:M.ScType:Thesis
University:University of Guelph (Canada)Candidate:McEachern, AndrewFull Text:PDF
GTID:2448390002453289Subject:Mathematics
Abstract/Summary:
The explosion of available sequence data necessitates the development of sophisticated machine learning tools with which to analyze them. This thesis presents improvements on a sequence-learning technology called side effect machines, as well as some mathematical theory about a lower bound on resources. It also applies a model of evolution which simulates the evolution of a ring species to the training of the side effect machines. A comparison is done between side effect machines evolved in the ring structure and side effect machines evolved using a standard evolutionary algorithm. The core of the improvement for the training of side effect machines is a nearest neighbor' classifier. A parameter study was performed to investigate the impact of the division of training data into examples for nearest neighbour assessment and training cases. The parameter study demonstrates that parameter setting is important in the baseline runs but the ring optimization runs showed strong robustness to parameter change. The ring optimization technique was also found to exhibit improved and more reliable training performance. Side effect machines are tested on three types of synthetic data, one based on GC-content, one that checks for the ability of side effect machines to recognize an embedded motif and one created by self-driving Markov automata. Two types of biological data, a data set with different types of immune-system genes and a data set set with normal and retro-virally derived human genomic sequence, are classified with excellent accuracies...
Keywords/Search Tags:Side effect, Sequence, Data, Nearest
Related items