Font Size: a A A

Design And Implementation Of Prediction Algorithm For RNA Secondary Structure With Pseudoknots

Posted on:2011-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:J T ChaoFull Text:PDF
GTID:2120360305985588Subject:Ecology
Abstract/Summary:PDF Full Text Request
Generally, the prediction accuracy of RNA pseudoknotted secondary structure carried out by current programs is very low for various reasons. However, after further study of various prediction algorithms and energy models, we found that there still was much space of improvement for prediction, In this article, we designed and implemented a novel heuristic algorithm, mergeStem, which can be used to predict the RNA secondary structure with pseudoknots effectively and finally may provide more reliable secondary structures for researchers to refer to.Among all current prediction algorithms, heuristic algorithms could predict pseudoknots very well and have no limitation of homologous sequences. Through following-up in-depth analysis, we found the prediction accuracy of heuristic algorithms depended on two key factors: (1) the range of local search; (2) pseudoknot energy model. Thus, here we carried out our work mainly concerning with these two key factors, briefly below:(1) Sometimes, on one hand, owing to its imperfection, current used energy model misleads the prediction; on the other hand, the prediction accuracy of algorithm first quickly increases and then slightly decreases with the expansion of local search range. This fact indicates that choosing a reasonable range of local search will further improve the prediction of algorithms. Specially, the range of local search has two layers of meaning: stem search range and structure remained range. So we re-estimated the size of these two ranges through the statistical analysis of a large amount of known RNA secondary structures. Then the estimated ranges would be helpful to improve the predictions of mergeStem.(2) In D&P model, pseudoknots energy calculation is mainly based on the multiloops'calculation mode, which only counts the total number of paired bases and unpaired bases. Although it varies different penalties for different types of pseudoknots, it cannot distinguish those pseudoknots with different structures but with same type. Thus we introduced new penalties for Gaps and Short stems by investigating enormous amount of pseudoknots, and the new D&P model would be beneficial for the predictions of mergeStem.We designed the mergeStem algorithm based on the above studies, and implemented it into a Java software mergeStem v1.0, whose interface was simple, intuitive and easy to use. In addition, the software was convenient for user to modify the local search range and different parameters for multiloops and pseudoknots. The output of the program is common used CT format file, which could be drawn into a visualized picture of RNA secondary structure by PseudoViewer3.0 or its web application.To verify this method, we chose 350 free-pseudoknot RNA sequences and 258 pseudoknotted RNA sequences from reliable sources or literatures. The results obtained by our method were compared with pknotsRG, FlexStem, HotKnots, PKNOTS and ILM, which were wide used algorithms. The comparison results were as follows: (1) The mergeStem showed significantly better than other five algorithms for 258 pseudoknotted sequences: The average sensitivity of mergeStem on the dataset of PK31, PL7 and PK220 is respectively 6%, 9% and 6% higher than the second-best one; the average specificity was respectively 4%, 17% and 1% higher; the positive control (PC) was respectively 4, 2 and 21 pseudoknots more. (2) The mergeStem performed slightly better than other five algorithms for 350 non-pseudoknot sequences: the average sensitivity of mergeStem on dataset of tRNA-200 and 5S rRNA-150 was respectively 1% and 2% higher than the second-best one; the average specificity is 3% and 6%; although the negative control (NC) of mergeStem didn't perform as well as pknotsRG and PKNOTS, but better than other algorithms.To evaluate the running time of mergeStem on computer, ten sequences of different length were selected. The running time of mergeStem was only slightly slower than pknotsRG and FlexStem except obviously slower than ILM. Compared with other internationally popular algorithms it can be drawn that mergeStem has largely increased prediction accuracy while maintaining similar running efficiency.In summary, the mergeStem gave a strong impetus to heuristic algorithm of prediction of RNA secondary structure including pseudoknots and significantly improved the prediction accuracy of pseudoknots. The mergeStem v1.0 could be a more reliable support tool for bioinformatics researchers. In the course of its implementation, once the length of sequence exceeded 500nt, the prediction would not be stable any more though this algorithm had the capability to predict RNA sequence containing pseudoknots of about 1500nt.
Keywords/Search Tags:heuristic algorithm, prediction, pseudoknot, RNA secondary structure
PDF Full Text Request
Related items