Font Size: a A A

Identifying Pre-miRAN Using Couplet-syntax For Local Sequence-structure Information

Posted on:2011-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:M H WangFull Text:PDF
GTID:2178330338476136Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
MiRNA is a kind of non-coding small single-stranded RNA found recent year, which plays very important role in regulation of gene expression. Many researches suggest miRNA has relations with a lot of biological processes such as disease generation, organism development, cell differentiation and so on. So identification of miRNA becomes the first task to study deeply the mechanism of how it regulates gene expression and how it works. There are two kinds of methods to identify miRNA, which are computational method and experimental method. Computational identification method is efficient and cost less, so it attracts more and more attention of scientists. In our paper, we study deeply on algorithms of computational indentification of miRNA, and present new algorithms to identify single-looped and multiple-looped pre-miRNAs. The work of this paper includes the following three aspects:(1) Couplet-syntax is introduced to depict local structure-sequence of pre-miRNA, which is a new feature extraction algorithm for predicting pre-miRNA. Couplet-syntax processes pre-miRNA's sequence and secondary structure as follows:①It describes substructure (including bulge and symmetric inner loop) of pre-miRNA's secondary structure precisely.②It masks the frequently variant nucleotide with null nucleotide.③Calculate the frequence of each couplet which is made up of nucleotide and structure symbol. We train a SVM classifier based on features ex-tracted by couplet-syntax to test the performance of the syntax. On human dataset of miRBase 12.0, the classifier achieves sensitivity of 81.98% and specificity of 87.16%. This classifier also can identify 86.71% pre-miRNAs of all other species of miRBase 12.0. Experiments on the same dataset prove that couplet-syntax is able to represent the most robust and intrinsic features of pre-miRNA compared to traditional local structure-sequence feature extraction algorithm.(2) Couplet-syntax is used to extract local structure-sequence information of single-looped pre-miRNAs. But there isn't any computational identification method only designed for predict-ing multiple-looped pre-miRNAs until now. We did some elementary study about multiple-looped pre-miRNA prediction and present a new identification algorithm based on cleavage for predict-ing multiple-looped pre-miRNAs. The principle of the algorithm is cleaving multiple-looped pre-miRNAs into some single-looped segments and then selecting the segment which contains most features of the multiple-looped pre-miRNAs as primary segment. The primary segment is used to represent the multiple-looped pre-miRNAs, and we determine whether the sequence is multiple-looped pre-miRNA based on the features of primary segment. The result of test gives sa-tisfactory classification precision, which proves the method is a valid algorithm.(3) We supply the source code of couplet-syntax and at the same time we also internetize the algo-rithm using CGI programming technology. The user can use our algorithm through Internet now, which simplify users'operation very much and facility the use of the algorithm.(4) We also study prediction algorithm of multiple-looped pre-miRNA, and present a new algorithm based on cleavage to identify potential multiple-looped pre-miRNA.
Keywords/Search Tags:Single-loop pre-miRNA Prediction, SVM, Couplet-Syntax, Multiple-loop pre-miRNA prediction
PDF Full Text Request
Related items