Implementation and evaluation of scoring schemes for the automated discovery of nucleic acid structures

Posted on:2007-08-23

Degree:M.C.S

Type:Thesis

University:University of Ottawa (Canada)

Candidate:Anwar, Mohammad

Full Text:PDF

GTID:2448390005466713

Subject:Computer Science

Abstract/Summary:

With recent experimental evidence, it has been shown that RNA (ribonucleic acid) plays a greater role in various cellular functions than previously thought. With the increasing number of known RNA families a need arises to develop computational techniques to analyze RNA sequences. An array of evolutionary related RNA sequences believed to contain signals at both the sequence and structure levels can be exploited to detect motifs common to all or a portion of those sequences. Finding these similar structural features can provide substantial information as to which parts of the sequence are functional.; Recently, Nguyen (M.A.Sc thesis, Electrical Engineering, University of Ottawa, 2004) introduced a novel approach for discovering consensus secondary structure motifs in a set of unaligned RNA sequences. The algorithm has been implemented in a software system called Seed. The aim of this thesis is to devise, implement and evaluate (3) scoring schemes for the software system. The first scoring scheme is based on the sum of the thermodynamics free energy, based on the nearest neighbor model. We then present a general framework for evaluation of RNA structures using statistical regression analysis. The third scoring scheme to be validated is based on the framework of minimum description length principle.; We implemented and validated the above scoring schemes on four different data sets having varying range of complexity. The first two were derived from selected members of UTRdb database where the coding region is flanked by two untranslated regions (5' UTR and 3' UTR). The others were assembled using a subset of the sequences from Masoumi and Turcotte (IJBRA, 1(2), 230--245, 2005). By three measures, positive predicted value, sensitivity and Matthews correlation coefficient, our methods performed well on the data sets and showed significant ranking statistics. Also, our first method compares favorably with state-of-the-art tool, RNAprofile. For small motifs, the scoring methods are able to rank motifs with high PPV/sensitivity, often 100%. The top ranked motifs were used as input constraints for MFOLD, a widely used tool for RNA secondary structure determination. They showed improvements in both PPV and sensitivity measurements of the foldings made.

Keywords/Search Tags:

RNA, Scoring schemes, Structure

Related items

1	An RNA Scoring Function For Tertiary Structure Prediction Based On Multi-layer Neural Networks
2	The Design And Implementation Of The Chinese Writing Composition Scoring Suggestion System For Senior High School Entrance Examination
3	Research On Automated Essay Scoring Method For Junior High School English
4	The Research And Application Of Automatic Scoring System Based On Abstract Syntax Tree
5	Automated essay evaluation and the computational paradigm: Machine scoring enters the classroom
6	Design And Implementation Of Automated Scoring System For English Non-essay Writing Questions
7	Fuzzified scoring of the functional assessment instrument
8	Towards Structure Learning Of CP-nets
9	Development Of HawkRank:a New Scoring Function For Protein-protein Docking Based On Weighted Energe Terms
10	Study Of Target-Scoring Algorithm Based On Image Recognition In Automatic Scoring System For Shooting Sports