Font Size: a A A

Research On RNA Secondary Structure Prediction Methods

Posted on:2012-06-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:J YuFull Text:PDF
GTID:1118330335452014Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Ribonucleic acid(RNA) is one kind of nucleic acids, which is one of the three most important macromolecular compounds. In the late 1970s, the research on RNA had accumulated many new discoveries. The finding of regulation RNA, the completion of Human Genome Program and the rise of post genome research make the research on RNA reach a new peak. With the development of RNA, the international institute of RNA was founded in 1993 and a new academic periodical, named RNA, was first published in 1995. The concept of RNomics brought forward in 2000, together with Genomics, Proteomics and Metabonomics, forms system biology. They will make important contributions for solving hard problems in the post genome era.So far we have obtained amounts of information of sequences, but the functions of most of them are unknown. Usually the function is associated with structure. In other words, the RNA sequences which accomplish the same function often have the similar structures. Therefore, many researchers have been attracted by the prediction of RNA secondary and 3D structure. Empirical approaches include X-ray crystallography and nuclear magnetic resonance spectroscopy but they are expensive and very time consuming and are not effective for all sequences. The researchers also adhere to the principle that the 3D structure can not be predicted directly through the sequence. The prediction of the secondary structure is the only way to predict the 3D structure. As a result, simulating the secondary structure will continue to become increasing important especially as the reliability of such methods improves.In this paper four different methods have been proposed to predict RNA secondary structure:(1 prediction of RNA secondary structure using dynamic programming method based on stems. In this paper, a new method has been presented to predict the RNA secondary structure which combines the characteristics and advantage of combinatorial optimization approach and traditional dynamic programming algorithm. For a given RNA sequence, firstly the set which contains all possible stems can be obtained. Adopting what are called loop dependent energy rules, the energy of each stem is calculated according to the length of the loop. Then, the secondary structure is constructed using dynamic programming algorithm based on stems. Our program was tested in examples with RNA sequences and the experiment results showed that our program is better than genetic algorithm in specificity, sensitivity and Matthews correlation coefficient, and is lower than traditional dynamic programming algorithm in complexity.(2)Simulating the folding pathway of RNA secondary structure using the modified ant colony algorithm. For a given RNA sequence, the set of all possible stems is obtained and the energy of each stem is calculated and stored at the initial stage. Then a folding pathway is simulated, including such processes as construction of the heuristic information, the rule of initializing the pheromone, the mechanism of choosing the initial and next stem and the strategy of updating the pheromone between two different stems. Compared with GA, ant colony algorithm embodies the inherent connections between different stems when choosing the next stem. On the contrary, GA is blind in the process of obtaining the initial solutions and crossing two solutions to generate new ones. Furthermore a more realistic formula has been used to calculate the energy of multi-branch loop compared with the dynamic programming algorithm. Finally by testing RNA sequences with known secondary structures from the public databases, we analyze the experimental data to select appropriate values for parameters. The measure indexes show that our procedure is more consistent with phylogenetically proven structures than software RNAstructure sometimes.(3)Prediction of RNA secondary structure based on fuzzy adaptive ant colony system. In the traditional ant colony algorithm the parameter q0 is set to a fixed value. To demonstrate its deficiency a linear control model is presented, in which the parameter q0 increases linearly with the iteration time. And the experiment results show that the performance of improved ant colony algorithm has a better global search capability. The main difference between the two is that the value of parameter q0 is smaller at the early iteration stage in the modified ant colony algorithm. With the iteration time increasing, the value of parameter q0 gradually increases too, which also leads to the algorithm trapped into the local optimum. Therefore a fuzzy adaptive control model is introduced to adjust the parameter q0 dynamically according to the fitness function's value distribution and the maximum iteration time of the optimal value unchanged. Using the classical traveling salesman problems as the benchmark, the experiment results indicate that the fuzzy adaptive ant colony system(FACS) has the stronger local and global search ability. When FACS is applied to predict the secondary structure, another new strategy is presented to enhance the efficiency, which is that extending the stems in two directions after the secondary structure is formed. This strategy greatly reduces the total number of suboptimal secondary structures. The methods (3) and (4) are from the same angle and both of them belong to the combinatorial optimization algorithm. The difference is that the performance of FACS is better than ant colony algorithm.(4)Prediction of RNA secondary structure based on frame construction. At the beginning of the iteration, the variable Emin is initialized with 0. The algorithm selects the first stem whose length is equal or greater than 4, then the allowed set is updated and the next stem will be selected till the allowed set is empty. A new structure is generated and its energy is calculated and saved in the variable Ecur. When the condition Ecur< 0.8*Emin satisfied, the structure will be saved as a new one if it has not been occurred yet, otherwise the occurrence time plus 1. Comparing Ecur with Emin, the value of Emin is equal to Ecur if Ecur
Keywords/Search Tags:bioinformatics, RNA secondary structure prediction, dynamic programming method, stem, ant colony system, fuzzy adaptive ant colony system, frame
PDF Full Text Request
Related items