Font Size: a A A

Studies On Algorithms For RNA Structure Prediction Including Pseudoknots

Posted on:2015-02-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z D LiuFull Text:PDF
GTID:1260330431455356Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
RNA(Ribonucleic Acid) play an important biological function as a biomacromolecule, RNA structure prediction is one of the basic issues in computational molecule biology, which is also a hot spot of international research. Many of these problems are NP-hard, so people seek for approximation algorithms instead of design polynomial algorithms, and as well as to guide biological practice.RNA tertiary structure is more stable structure, and RNA secondary structure prediction is the first step to predict RNA tertiary structure in RNA sequence. The principal methods of RNA secondary structure prediction is:comparative sequence analysis method and minimum free energy method. Comparative sequence analysis method was adopted in early period, which gets secondary structure by comparing RNA sequences of the same biological function in different organisms,but homologous sequences are hard to get for many RNA molecules, and a lot of cost for manpower is needed, so the predicted efficiency of comparative sequence analysis method is low. Minimum free energy method is adopted widely to predict RNA secondary structureMfold algorithm,which was presented by Zuker, has become the most widely used optimal methods for RNA secondary structure prediction,the essence of minimum free energy method is searching the RNA structure with minimum free energy from all conformations formed by the given RNA sequence based on thermal dynamic model. Mfold algorithm has the predicted accuracy of about70%, but it can not predict pseudoknots and more complex tertiary structure,so the algorithm is limited to use in predicting RNA structure. Pseudoknots is the most extensive RNA tertiary structure, very complicated and stable RNA structure. RNA pseudoknotted prediction is the key and hot point to the research of RNA structure prediction now. pseudoknots have function of constructure, interference, catalyaing and regulation, and play diversity roles in different RNA molecules. It has been proved to be NP-hard problem that predicting RNA secondary structure including arbitrary pseudoknots, have yet not to find the polynomial algorithm.so the approximation algorithm is the nature methods to solve the NP-hard problem.Adjacent base pairs form stack, cross of base pairs form pseudoknots, cross of stems form pseudoknotted strcture. Now it is difficult to compute large RNA molecules for existing polynomial time predicting algorithms including pseudoknots. Find the optimal RNA structures based on combination with stems have become new method to predicting RNA pseudoknotted structure, such as the energy algorithm based on combination with stems presented by Benedeti to predict RNA secondary structures. Recently, Ruan give a heuristic algorithm based on stems to predict secondary structure including pseudoknots,which time complexity is O (n4) and space complexity is O (n2). We present a heuristic algorithm based on the relative stability of the stems and minimum free energy principle in RNA molecules according to RNA pseudoknotted model to predict RNA secondary structure with pseudoknots, which time complexity is O (n3) and space complexity is O (n2), the experimental tests on the RNA families in PseudoBase shows that the algorithm outperforms other known algorithms in predicting accuracy, sensitivity and specificity.Minimum free energy algorithms are more discussed on RNA structure prediction with simple pseudoknots by limit pseudoknots. Rivas algorithm which have been presented by Rivas and Eddy takes O (n6) time and O (n4) space to predict arbitrary planar pseudoknots and partly nonplanar pseudoknots. JR algorithm, presented by Jens and Robert, takes O (n4) time and O (n2) space,to predict simple nested pseudoknots. Lyngs algorithm, presented by Lyngs(?) and Pedersen, takes O (n5) time and O (n3) space,can only predict one planar pseudoknot. Adjacent stack can form stem,stack and stem is principal power of stabilizing RNA structure, RNA pseudoknotted strcture can be folded by maximizing weighted matching algorithm presented by Cary and Storm,but at the cost of lower predicting accuracy. Maximum stacking problem has also attracted attention in RNA secondary structure prediction with pseudoknots. It is NP-hard problem that maximizing the number of stacking pairs in a planar RNA secondary structure allowing pseudoknots, Ieong.S et al presented the problem of maximum stacked base pairing number, designed planar approximation algorithm and general approximation algorithm, and proved maximum stacking problem in planar RNA secondary structure belongs to NP-hard class. They analyse RNA secondary structure including pseudoknots, the characteristics of consecutive stacking pairs and pseudoknots, and approximation algorithm of computing maximum stacking problem,design the approximation ratio of1/3, and give the proof. We also discuss the computational complexity, the algorithm can predict more complex pseudoknots.Adjacent stack can form stem,according to the RNA optimized structure of stem,we divide the stem into some subsegments with the length is less than t(t>0), seach the optimal structure formed by subsegmens with the length less than t as the approximation structure of given sequence, and design a polynomial time approximation scheme(PTAS) to predict arbitrary pseudoknots.Given an RNA sequence S and an integer h, we show that the decision problem is NP-hard by reducing the tripartite matching problem to the problem. We show that the problem is NP-hard to maximize the number of stacking pairs in planar RNA secondary structure.The main work in this paper as follows:1. The representation of RNA structure based on the minimum free energy model is the key to RNA structure prediction. For pseudoknotted structure, we can divide it into planar pseudoknot and nonplanar pseudoknot. pseudoknots can form nested structure or parallel structure, two stems can form pseudoknotted structure, internal loop and bulge can constitute planar pseudoknot, planar pseudoknot often appear in the RNA molecule, cross pseudoknot also exist in RNA molecule. Stem play an important role in the RNA structure stability, based on the characteristics of the cross of stems can form pseudoknot, we can build the representation model of the pseudoknots by stem. In the pseudoknotted database of PseudoBase, there are lots of pseudoknots in it, it contains a small amount of nonplanar pseudoknots. It can obtain a good result that predicting RNA pseudoknotted structure by proper pseudoknotted model through design heuristic functionAccording to RNA pseudoknotted model, based on the relative stability of the stems and minimum free energy principle, we present a heuristic algorithm to predict RNA secondary structure with arbitrary pseudoknots, the experimental tests on the RNA families in PseudoBase database shows that the algorithm is better than other known algorithms in predicting, sensitivity, specificity and accuracy, which time complexity is O (n3) and space complexity is O (n2).2. In general, continuous stack can form the stem structure, stem structure can reduce the energy of RNA structure, and the structure is more stable. It is one of the important methods we use that predicting RNA opitimal structure through the combinatorial optimization characteristic of stems, it can be form parallel structure, nested structure and cross structure between stems. Containing cross structure is including pseudoknots, the existence of pseudoknots makes RNA structure prediction complicated, it is an important factor of intractability problem, and it makes design a polynomial time algorithm is extremely difficult. It is an important means to deal with the problem to design approximation algorithm or approximation solution of the problem.According to the optimal structure of stems, we divide the stem into sub-stems with the length is less than t(t>0), seach the optimal structure formed by sub-stems with the length less than t as the approximation structure of given seqaluence, and design a polynomial time approximation scheme.3. In the RNA sequence of bases, two successive base pairs can form a stack, from the perspective of the stack, many consecutive base pairs may form a continuous stack. The more the number of consecutive stack stack, the RNA structure is more stable, in the RNA structure prediction, it is also a NP-hard problem to compute of the maximum number of stack, for this kind of problem, it is difficult to design a polynomial time exact algorithm, it is better to design a polynomial time approximation algorithm through in-depth analysis of its intrinsic properties, we analyze approximation performance ratio of the approximation algorithm, try to lower approximation ratio, and guide solution of the problem.According to the characteristics of consecutive stacking pairs,we analyse the computational problem of maximum stacking of RNA secondary structure, find consecutive stacking pairs, and analyse the intrinsic characteristic,an approximation algorithm of computing maximum stacking problem is presented, which the approximation ratio is3, and give the proof of the approximation performance ratio.The future work as follows:1. Design approximation algorithm with arbitrary planar pseudoknots,reduce the approximation ratio and time complexity.2. Design approximation algorithm with general pseudoknots,reduce the approximation ratio and time complexity.3. According to RNA struture of planar and nonplanar,analysis the characteristics of structure and combinatorial optimization,design more accurrate algorithm.
Keywords/Search Tags:RNA Structure, Stacking, Pseudoknots, Approximation Algorithm, Approximation Scheme
PDF Full Text Request
Related items