Font Size: a A A

Analysis And Comparison On RNA Secondary Structure Prediction Algorithm

Posted on:2010-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:D YuFull Text:PDF
GTID:2178360272495787Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Great progress has been made on the field of research of technique of prediction of RNA after decades of development. New types of RNA have been discovered ,such as ribozyme RNA, antisense RNA, small molecules nucleolar RNA, double-stranded small interfering RNA and microRNA molecules with important functions,that makes humanbeing profoundly understand diversity and complexity of RNA.With the re-cognization of central position of RNA, the research of RNA has been hot in the field of bioinformatics research.RNA structure prediction study started earlier. In 1981 minimum free energy algorithm was proposed by the Zuker, that now has become the most widely used method of RNA secondary structure prediction after twenty years of continuous improvement and development. However, on the one hand, the average prediction accuracy of is only 50 ~ 70% that is not high enough; on the other hand, the limitations imposed on itself by the algorithm can not predict pseudoknot and three more complex interaction, therefore the method of RNA structure prediction unable to meet the high demands of growing research on the current RNA structure prediction. In order to be able to predict pseudoknot and tertiary structure, new algorithms and improved algorithm are put forward. RNA structure prediction arouses research attention again and has become a hot issue of Bioinformatics.RNA's various functions are closely linked to the specific structure. In order to make further exploration, the help of the secondary structure and even three-tier structure are needed. For some non-coding RNA or structure RNA, their structure are more conservative than their sequence. Finding out their specific structure not only helps to gain more detailed understandings of various types of RNA in the cells of the operating mechanism, but also to search the new gene in the genome and brings out help for improving the accuracy of prediction of protein structure. Because of the high speed of RNA molecules'degradation and difficulty of crystallization , it is hard to determine RNA three-dimensional structure of molecules through the X-ray diffraction and nuclear magnetic resonance (NMR) and other experimental methods. Although the results obtained are accurate and reliable, facing the current mass of biological sequences this approach obviously can not keep up the requirements. Therefore, like protein structure studies, using computers and a variety of mathematical techniques to predict RNA's space structure is a shortcut that raises the efficiency of RNA awareness and principal.method to rely on.In this paper, a variety of prediction algorithms are classified, such as comparative sequence analysis, dynamic programming algorithm in the minimum free energy method, combinatorial optimization algorithm, the maximum weight matching algorithm, heuristic algorithm and etc. These prediction algorithms have advantages and disadvantages.None of them can solve RNA secondary structure prediction problem.The conditions of applying various algorithms are also subject to certain restrictions, so to solve the prediction problem combinationating various mutual authentication algorithm is current research trend.Pseudoknot prediction is a difficult point for the prediction of RNA. Either from point of view of the algorithm or from the point of view of software upgrading and application, the pseudoknot prediction is challenging. RNA structure prediction at this stage there are still relatively poor accuracy of prediction to limit the questions are too ideal. Therefore, in the process of algorithm design and software design, strengthening the capacity of pseudoknot prediction of is inevitable.RNA prediction software is a prediction platform based on a variety of prediction algorithm that greatly facilitated the study of RNA. With the deepening of RNA research prediction, softwares of various styles and functions of softsyles is continuously improved. In this paper, a variety of RNA secondary structure prediction softwares are introduced. Prediction softwares designed based on all kinds of prediction-based algorithm are various and different. Some of softwares are with higher prediction accuracy and some lower. Part of the softwares are based on the Linux and Unix operation system. Some authors of the softwares provide a web-based online prediction procedures providing convenient conditions for researchers.How to choose suitable software for the purpose of bring out services for researchers is worth considering. Facing kinds of different prediction softwares, to selecting the suitable software that is based on the type and length of RNA sequence is very conductive for the prediction efficiency of active research on the reliability of the results. Meanwhile, for different users, requirements for software are not the same. Some emphasize on free energy analysis of the various parts of the free energy value; Some focus on the structural realities of two-dimensional image. With the diversity of RNA and extensive research, researchers have not satisfied with using one kind of software for RNA structure prediction but a wide range of algorithms and a variety of softwares to distinguish RNA data striving to a more intuitive graphical interpretation. It also makes the design of RNA and improvement of software develop towards many algorithms, intelligent and comprehensive. At the same time, a deeper understanding of adaptability and limitations of various softwares, can provide aids for software operation efficiency improvements and new design of software.The designs of RNA softwares are mainly foreign countries reserch results. In recent years,domestic scholars improve the prediction standard on the algorithms and tools and put forward a number of new algorithms together with a series of local operation-based prediction softwares. The measure of Prediction accuracy currently used by the vast majority literature is the sensitivity, specificity and Mcc of the three Evaluation parameters. The nontion of false negative and false positive is usually used when the experimental results are usually measured. In the prediction of RNA secondary structure, TP (true positive) indicates that the right number of base pairs, FN (false negative) indicates that the number of base pairs exists in the real structure but not be the predicted actually; FP (false positive) indicates the number of base pairs that there is no real end but be predicted errorly; TN (true negative) do not express the correct prediction of the number of base pairs. Generally, TN is much larger than TP, FN and FP so in practice it is rarely used in measuring. Sensitivity (X) refers to correctly predicted percentage of all base pairs in real structure. Specific (Y) refers to the percentage of correct prediction in all the base pairs. General Prediction methods are difficult to give consideration to two or more things and they always prefer one side. Therefore Matthew coefficient (MCC) is the compromise measure.This article seeks to improve standard of the design and provide a basis for reference through the actual operation of the specific data analysis for the comparation the efficiency of the software with typical meaning of some of the RNA structure prediction softwares.In this paper, five types of commonly used RNA softwares, such as RNAfold, RNAstructure, RNAshapes, Mfold, Pfold are selected. The tRNA with known data structure is operationed, and each type of structure prediction software is received. Through the actual comparison of the secondary structure of each combination of softwares, the advantages and disadvantages on the software design for the future are analyzed and some personal views and opinions are put forward...
Keywords/Search Tags:Bioinformatic, RNA secondary structure prediction, Software, Evaluation
PDF Full Text Request
Related items