Font Size: a A A

A Novel Algorithm On RNA Secondary Structure Prediction Including Pseudoknot Based On Combination Of Stems

Posted on:2010-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:C XuFull Text:PDF
GTID:2120360272495742Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with going deep into RNA function, it is found that RNA is very important inevolution and inheritance and it is as emphasis as protein in genetic central dogma. Thefunction of RNA is correlated with its structure, so studying RNA structure is veryimportant. Biological experiments are precise but they cost too much money and time. It isa very good solution to the problem using computers to predict by means of Bioinformaticsand then studying the result. Therefore, the computer prediction methods of RNA secondarystructure developped rapidly.A large number of RNA secondary structure predictionmethods, software, RNA secondary structure databases and so on emergence.Pseudokonts are nestings which are exceptions of general situations includingembraces and parataxis. Producting pseudokonts is important and difficult in RNAsecondary structure prediction. Because there are not clearly definitions and reality thatprove which kind of pseudokonts are reasonable, therefore pseudokonts modeling is verydifficult. The current methods including pseudoknots neighter have low preciseness norspend long time and large space. So many methods do not consider pseudokonts, but theyare really existed.At present, a number of RNA secondary structure prediction algorithms are roughlydivided into the following categories:Comparative Sequence Analysis or PhylogeneticMethods, Dynamic Programming Methods, Combinatorial Algorithm, Heuristic Algorithmssuch as Genetic Algorithms, Neural Network Algorithms, Monte Carlo SamplingAlgorithms and so on. They each have their own advantages and disadvantages and havetheir own principles. These ideas have great influence to the development ofBioinformatics.In all these methods, the prediction results of Comparative Sequence Analysis are onlyworse than the experiments using x-ray or NMR structure determination. However, becausethe application of such methods required a certain number of related sequences that wehave known their secondary structure, and assuming that these sequences should have thesame secondary structure and a number of common basic structural units.For small samplesor the sequences that are significant different in source , the comparison of results are notreliable. Zuker's minimum free energy method can not predict pseudoknot structure, At thesame time for longer sequences the algorithm can't implement efficiency. BecauseCombinatorial Algorithm take into account of the pseudoknots prediction, it causes largerprediction error.Although the pseudoknot prediction have the ability to predict pseudoknots, but it is confined to a lot of conditions. Heuristic algorithms although can solve thelarge-scale combinatorial optimization problem, but they can not guarantee to reach globaloptimum, so the accuracy is not high, the implementation are more complex, it can be areference or be used in combination with other methods.So the current methods neither don't support the pseudoknot structure nor exist thedilemma problem of accurate rate and the time and space complexity .In this paper, we propose a method based on the combination of the stems to predictRNA secondary structure . It can predict pseudoknot structure at the same time it has ahigher prediction accuracy and spends lower time and space complexity. This methodmainly uses Compatibility Matrix, Iterative Matrix and Best Energy, finally we get the bestcombination of stems to predict RNA secondary structure, at the same time, we propose alot of innovation definitions such as Prefix, Surfix, Best Energy, Iterative Matrix etal, thesenew definitions are contributed to promote the success of algorithm implementation. Theyhave the references for other relevant studies of RNA structure prediction; we innovatelyproposed prefix and suffix matching algorithm. Comparing to the traditional use of thematrix method , it has exaltation in time and space complexity, the improvement is obviousin the longer RNA sequences. Through programming implementation of this method andselecting RNA sequences of PseudoBase, we use RNA secondary structure predictionresults to compare with the RNA secondary structures that are provided on database, usePknotsRG software to run the same data and compare the results with our RNA secondarystructures.The best sensitivity and specificity such as NGF-L2 and NGF-H1 were 100%.Test results and analysis of data show that the method is reliable, effective, and it has a highaccuracy rate, can predict pseudoknot structure, particularly it implement fast in tRNAsequences.The algorithm is divided into three parts:the generation of all largest stems of RNA,the generation of Compatibility Matrix, useing Iterative Matrix and Best Energy tocalculate the optimal combination of the stems.The reasonableness of stems has thefollowing considering: The stem length should be greater than or equal to three,the distancebetween two complementary areas of stem base is greater or equal to two, whether GUmatching is at both ends of the stems. Although some of the stems are not satisfied with theabove conditions, the conditions that the stems in the sequence can be opened one orseveral bases to meet that principle should also be taken into account. Or the result is nottrue enough, it probably missed some important stems. If it is reasonable stems or stemsmodified to be reasonable, calculate the energy of the stems and store them in the stems list.The purpose of this is as far as possible to reflect the RNA in the objective circumstances.Compatibility Matrix can be used to delineat the specific of the relationship between stems,it can clearly identify the formation of stacking, bulge loop ,interior loop, hairpin loop,multi-branched loop structure. For they can be able to calculate the accurate energy that isabsorbed; These are the important data to calculate the best energy. Use Iterative Matrix constantly iteration to find the combination of stems with the optimal energy of overallsituations. This combination can generate RNA secondary structure, It is a more reasonableand relatively stable combination , From the final results of tests we can find this feature.Because this method is to strike the best overall combination of the stems,as long as thepseudoknot structure stems that are part of the optimal combination of the stems causingbest energy, then they can be retained to predict 14 kinds of possible pseudoknots.The characteristics of this algorithm is to get the overall optimal solution situation, italso considers more optimal solutions; it can retain a number of solutions rather than just asingle solution and neglect other solutions, this will guarantee the authenticity of the results.In addition, because the relationships between stems are clearly delineated, they can beimplemented in both cases that allow pseudoknots and do not allow pseudoknots; this willbe used in some RNA sequences without pseudoknots .The restriction to the obtained stemsis also more flexible, it can be generated to improve the efficiency of algorithms. Algorithmspends smaller space complexity.After the completion of this method we study how to further improve the efficiency ofalgorithms, find the optimal combination of parameters through numerous tests . Accessingrelevant information and increasing parameters to improve the accuracy of prediction.Improve continuously toward third structure of the RNA and protein which have complexstructure.
Keywords/Search Tags:RNA Secondary Structure, Prefix, Surfix, Stems, Best Energy, Compatibility Matrix, Iterative Matrix
PDF Full Text Request
Related items