Font Size: a A A

Analysis Of NcRNA Based On Sequence And Structure

Posted on:2011-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2120360305454733Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The main contents and contributions of the dissertation are summarized as follows: Non-coding RNA play a pivotal role in the activities of organisms. For example: TRNA play a role in delivering amino acids and participating in synthesis of amino acids ; SSU rRNA5 and protein combine to form the ribosome ; Mir1302 have the fine-tuning function in gene expression and the process of cell cycle ; RNasebacta have the same catalytic function with protein .To further study the different functions of these four non-coding RNA , first of all should be able to predict their structure. In this paper, there are three methods of predicting the structure of non-coding RNA sequence : The first method of classification of four categories non-coding RNA sequences is by the consistent structures of RNA , first we can get the consistent sequence and Phylogenetic tree through a multi-sequence alignment software clustalW2. Second we can predict the their consistent structures through their consistent sequence, we can find mir1302 is of the Long-handled structure; tRNA is of clover shape ; RNasebacta is of a large open ring and a glandulifera ; SSUrRNA5 is of a large and open-ring and two glandulifera .The second method is giving new respresentation of RNA sequence according to RNA sequence descriptors(quantitative descriptive DV-cruve).The advantage of this approach has the the representation of longer RNA sequences . First step is assigning the coordinates of two points to four bases (A,G,C and U) of RNA sequence, The second step is calculating coordinates in the horizontal axis and vertical axis of base by algorithm . The third step is connectting the origin and the vertical coordinates of base.In this way ,a DV-curve is formed. Through the DV-curve , we can clearly distinguish the difference between four types of RNA.The third method is basing the support vector machine.The first step is extracting sequence features of the four types RNA sequences, RNA sequence features include one base(A,G,C and U) rate,two base(AA,AC,AG,AU,CA,CC,CG,CU,GA,GC,GG,GU,UA,UC,UG,UU) rate and three base rate;The second step is getting secondary structure in dot bracket form according to secondary prediction soft.The structure feature is getting according to the secondary structure in dot bracket form. The structure feature include structure"(",".",")"rate,structure"((",")(",". (","()","))",").",". (",". )",". .)"rate and triple structural rate;The third step is getting feature basing sequence and structure.The fourth step is basing on these characteristics,Support Vector Machines can predict four categories of non-coding RNA sequences and obtained the corresponding recognition rate. By comparing the various recognition rate, prediction accuracy according to 1,2,3 sequence of sequence and structure is the highest.All of above are the non-coding RNA sequence recognition method.In order to get accurate classification,it should be an integrated use of the above four methods, which can be more precise.
Keywords/Search Tags:Non-coding RNA, Multi-sequence alignment, Consistent structures, Phylogentic tree, DV-cruve, Support vector machine
PDF Full Text Request
Related items