Font Size: a A A

RNA Pseudoknots Prediction Algorithm Based On Stack

Posted on:2009-08-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:H W LiFull Text:PDF
GTID:1118360245494528Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
RNA structure prediction is one of the basic issues in computational biology. RNA secondary structure prediction is the first step to predict RNA tertiary structure from RNA sequence.Comparative sequence analysis method was adopted in early period,which gets secondary structure by comparing RNA sequences of the same biological function in different organisms.Homologous sequences are hard to get for many RNA molecules,and a lot of cost of manpower is needed,so the predicted efficiency of comparative sequence analysis method is low.Minimum free energy method is adopted widely to predict RNA secondary structure now.Minimum free energy method searches the structure with minimum free energy from all conformations formed by the given sequence based on thermal dynamic model.MFOLD algorithm,presented by Zuker,has become the most widely used optimal methods for RNA secondary structure prediction,which has the predicted accuracy of about 73%.But MFOLD algorithm can not predict pseudoknots and more complex tertiary structure.Pseudoknot is the most extensive RNA tertiary structural element,very complicated and stable RNA structure.Pseudoknots prediction is the key and hot point to the research of RNA structure prediction now.Pseudoknots have regulative, catalytic and structural important function and diversity roles in different RNA molecules.It has been proved to be NP-hard class that predicting RNA secondary structure containing arbitrary pseudokonts.More discussions on RNA structure prediction containing pseudoknots are centered on minimum free energy algorithms to predict simple pseudokonts. PKNOTS algorithm,presented by Rivas and Eddy,takes O(n~6)time and O(n~4)space to predict arbitrary planar pseudoknots and partial nonplanar pseudoknots;PknotsRG algorithm,presented by Jens and Robert,takes O(n~4)time and O(n~2)space to predict simple nested pseudoknots;LP algorithm,presented by Lyngsφand Pedersen,takes O (n~5)time and O(n~3)space to predict one planar pseudoknot.Combinatorial optimization algorithms fold RNA pseudokonts through simplifing energy model to consider partial main acition,but it sacrifices predicted accuracy,such as maximum weighted matching algorithm presented by Cary and Storm.There are some computer attempts to predict pseudoknots with heuristic approaches include genetic algorithms,simulated annealing algorithms and neural network algorithms.Adjacent base pairs form stack,stacking and base pairing actions in RNA molecules are the most primary and stable actions.Maximum stacking problem has also attracted close attention in RNA secondary structure prediction containing pseudoknots.Ieong presented the problem of maximum stacked base pairing number in 2001,designed its approximate algorithm with the approximate performance ratio of 3,and Lyngsφgave its exact algorithm.Moreover Lyngsφpresented the problem of maximum stacking number,proved this problem belongs to NP-hard class and designed its polynomial time approximate scheme.Lyngsφtreat all stackes as the same,but RNA structural experimental results indicate that the different types of stackes have the different energy,and the energy of stack is determined by the type of its base pairs.So we present maximum weighted stacking problem based on biological stacking and base pairing actions,and discuss its solving algorithm and complexity.We divide stackes into(AA,BB)and(AB,BA)classes,compute the optimal structure of(AB,BA)class by the property of(AB,BA)class constructing clique, calculate the approximate structure of(AA,BB)class by the property of stack partion, get the approximate structure by selecting the one with the maximum weight from the weight of(AA,BB)and(AB,BA)classes,and give polynomial time approximation algorithm to predict arbitrary pseudoknots with O(nlogn)time and O(n)space.The approximate performance ratio of this algorithm is 3.We divide the given sequence into subsequences with the length less than K+ 1(2≤K),calculate and store the paired weight and unpaired number of various subsequences,delete impossible structure and the structure with smaller weight by daynamic programming,seach the optimal structure formed by subsequences with the length less than K+1 as the approximate structure of given sequence,and give a polynomial time approximate scheme to predict arbitrary pseudoknots.Now it is hard to fold large RNA molecules for existing polynomial time pseudoknots prediction algorithms.As continuous stackes construct stem and crossed stems construct pseudoknots,search the optimal structures based on combination with stem zones has become new method to RNA structure prediction,such as stem zone stacking algorithm presented by Benedeti and stem zone random stacking algorithm presented by Li WJ,predict nested secondary structures.Recently Ruan give a heuristic algorithm based on stem zone to predict secondary structure containing pseudoknots with O(n~4)time and O(n~2)space.We design a heuristic algorithm to maximize stem and predict arbitrary pseudoknots.First we search all stems of given sequence,select the largest stem with the minimal energy,and mark the bases in the selected stem to make them don't take part in back pairing.Then we select the largest stem from the remainder bases,and follow on until no stem.It takes O(n~3)time and O(n)space to predict RNA sequences with 5000 bases.Our algorithm replace base pair with stern,and replace delete with mark,which remove lots of redundancy base pair and nonsense stack,and increase predicted accuracy.Compared with maximum weighted matching algorithm with O(n~3)time and O(n~2)space,our algorithm reduce space complexity from O(n~2)to O(n);and the experimental results show that its sensitivity is improved form 80%to 87.8%,and specificity is increased from 53.7%to 75.9%.Compared with genetic algorithm with the accuracy of 83.3%and simulated annealing algorithm with the accuracy of 79.7%,our algorithm increases the predicted accuracy to 87.5%.The key to RNA structure prediction with minimal free energy method is modeling RNA structure.Based on the characteristic of relative stablisity of RNA stem zone,we introduce semi-extensible structures and k-stems,compute semi-extensible structures with k-stems,calculate nested and crossed pseudoknots with the crossing of semi-extensible structures,and establish a new model to express pseudoknots.Planar pseudoknot is the most extensive subclass of pseudoknots.In Pseudobase, there is only one nonplanar pseudoknot,and others are all planar pseudoknots.Based on new model,we introduce coaxial stacks actions,design and implement dynamic programming algorithm to predict arbitrary planar pseudoknots and simple nonplanar pseudoknots.All 254 pseudoknots in the PseudoBase are computed with this algorithm,the predicted structures of 173 sequences contain correct pseudoknots,the forecast accuracy is 70.6%;the average sensitivity is 83.6%and the average specificity is 76.6%.For 84 sequences of 3-UTR other virus,the forecast accuracy is 84.5%,the average sensitivity is 91.0%and the average specificity is 91.2%. Experimental results show that this algorithm has good accuracy.Compared with the best RE algorithm,the length of the given sequence computed by our algorithm is improved from 140 bp to 800 bp,the time complexity is reduced from O(n~6)to O(n~4)and the space complexity is droped from O(n~4)to O(n~3); for PKNOTS tested sequences,the sensitivity is increased from 82.8%to 84.8%and the specificity is raised from 78.9%to 80.7%.Compared with PknotsRG algorithm with O(n~4)time and O(n~2)space to compute simple nested pseudoknots,our algorithm can compute more complex nested and crossed pseudoknots with the same time complexity as PknotsRG algorithm.The test results for PseudoBase data base indidcate that PknotsRG algorithm has the predicted accuracy of 68%and our algorithm has the one of 70.6%for pseudoknots. Compared with LP algorithm to compute one planar pseudoknot with O(n~5)time and O(n~3)space,our algorithm can compute more complex multi-pseudoknots,and reduce the time complexity from O(n~5)to O(n~4).There are five parts of contents and results in this paper.The basic concepts, research significance and status are summarized in the first chapter.Existing RNA structure prediction algorithms for nested structure and that for pseudoknots are reviewed and summarized in the secondary chapter.Polynomial approximation algorithm and approximation scheme for maximum weighted stacking problem are designed in the third chapter.Heuristic algorithm to maximize stem and its experimental results are given in the fourth chapter.Dynamic programming algorithm and its experimental results are given to predict planar pseudoknots in the fifth chapter. The end is a brief summary.There are three innovation points in this paper as follows.(1)Based on stacking and base pairing actions,we present maximum weighted stacking problem and design its polynomial time approximation algorithm with O(nlogn)time and O(n)space and its polynomial time approximation scheme. The approximate performance ratio of this approximation algorithm is 3.(2)We design a heuristic algorithm to maximize stem with O(n~3)time and O(n) space to predict RNA sequences with 5000 bases.Compared with maximum weighted matching algorithm with O(n~3)time and O(n~2)space,our algorithm reduce space complexity from O(n~2)to O(n),and the experimental results show that its sensitivity is improved form 80%to 87.8%,and specificity is increased from 53.7%to 75.9%.(3)We establish a new model to express pseudoknots including semi-extensible structures and k-stems,design a dynamic programming algorithm to predict arbitrary planar pseudoknots and simple nonplanar pseudoknots.The predicted sensitivity is 83.6%and the predicted specificity is 76.6%for PseudoBase data base.Compared with the best RE algorithm to predict arbitrary planar pseudoknots and partial nonplanar pseudoknots,the length of the given sequence computed by our algorithm is improved from 140 bp to 800 bp,the time complexity is reduced from O(n~6)to O(n~4)and the space complexity is droped from O(n~4)to O(n~3);for PKNOTS tested sequences,the sensitivity is increased from 82.8%to 84.8%and the specificity is raised from 78.9%to 80.7%.
Keywords/Search Tags:RNA Structure Prediction, Pseudoknots, Computational Biology, Approximation Algorithm, Approximation Scheme, Dynamic Programming
PDF Full Text Request
Related items