Font Size: a A A

A machine learning approach to prediction of RNA editing events

Posted on:2011-08-22Degree:M.SType:Thesis
University:Lehigh UniversityCandidate:Stoev, IvanFull Text:PDF
GTID:2448390002950219Subject:Computer Science
Abstract/Summary:
Adenosine-to-inosine (A-to-I) RNA editing is a post-transcriptional process that alters the RNA molecule. It is important to study this process because deficiency or misregulation of A-to-I RNA editing may be the cause of neurological diseases. However, to date the RNA editing machinery is still poorly understood and the number of known recoding editing substrates is still limited. This goal of this thesis is to develop a machine learning approach to prediction of novel editing sites based on a variety of features. The thesis details and implements the Support Vector Machine (SVM) classification algorithm with support for graph and string kernels. The graph kernels enable machine learning from RNA foldback structures -- secondary structures computed by the RNA Editing Dataflow System (REDS). String kernels allow for learning based solely on nucleotide sequence features. Multiple classifiers are designed and evaluated with training data from experimental lab work done at Lehigh University. In addition, due to difficulties of determining a truly negative class (sites that never undergo editing), experiments with the single-class SVM on some of the classifiers were run. Our results indicate that the mismatch kernel [Leslie et al., 2004] classifier generalizes the best out of all classifiers we tested. The mismatch kernel classifier achieved precision rate of 0.88 and sensitivity rate of 0.82 in leave-one-out cross-validation tests. Using this classifier, we suggest new high-confidence RNA editing candidate sites that could be later verified experimentally in the lab.
Keywords/Search Tags:RNA editing, Machine learning approach
Related items