Font Size: a A A

A New Visual Representation For RNA Secondary Structure And Its Application

Posted on:2011-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:C LiangFull Text:PDF
GTID:2178360308968907Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The comparison and analysis of the biological sequences is one of the hot spots of bioinformatics.Biological sequences generally refer to DNA, RNA or protein sequences. With the development of the research, RNA that contains the genetic information has become the focus of the research.As a matter of fact, the RNA secondary structure is more conservative than its primary sequence, and a lot of information that can be used for classification and phylogenetic analysis has been found in RNA secondary structure. Therefore, the analysis of RNA secondary structure is of great significance and value.The research content of this paper is the similarity of the RNA secondary structure.Here we propose two methods to analysis the similarity of RNA secondary structure respectively based on a new visual representation and the Lempel-Ziv complexity.This provides a new way for visualization and analysis of biological sequences.The main work of this paper is as follows:(1)We propose a new visual representation for the RNA secondary structure-CZ curve, and introduce two properties of the CZ curve. Accoding to the CZ curve we show the projection graphs of the points corresponding to the RNA secondary structures, and we can get some information of the base composition and similarity of the RNA secondary structures directly from the graphs. Then our method is applied to compute the similarity of RNA secondary structure.After showing the results of the similarity analysis between the RNA secondary structures, we utilized the similarity matrix combining the hierarchical clustering algorithms to give the phylogenetic tree for the real 11 RNA secondary structures. The results show that our method can not only effectively analyze the similarity between RNA secondary structures (including pseudoknot), but also classify the different kinds of RNA secondary structures accurately. Moreover, our method only needs the geometrical center of the characteristic curve of the RNA secondary structure to compute the distance matrix, so it has low computational complexity.(2)In view of the problem that different RNA secondary structures may correspond to the same characteristic sequence, we propose a new method to describe the characteristic sequence of the RNA secondary structure, and give the rules that can be referred to in the changing progress. Then we compute the similarity between the new characteristic sequences by using Lempel-Ziv complexity. We choose two data sets from paragraph 3 as our test data. The results are consistent with the analysis given in other literatures, which show our methods can effectively extract the structural information of the secondary structures, and avoid the problem that different RNA secondary structures may corresponse to the same characteristic sequence.
Keywords/Search Tags:RNA Secondary Structure, Visual Representation, Analysis of Similarity, Phylogenetic Tree, Lempel-Ziv Complexity
PDF Full Text Request
Related items