Font Size: a A A

Research On Relevant Computational Problems Of Noncoding RNA

Posted on:2011-11-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J ZhaoFull Text:PDF
GTID:1100360308485575Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Non-coding RNAs (ncRNA) are defined as all functional RNA transcripts other than protein encoding messenger RNAs (mRNA). The ncRNAs play many key roles in the various process of life, including gene regulation, chromatin remodeling, gene localization, gene modification and DNA imprinting. Researching in ncRNAs not only has importance of theory and applications but also will offer necessary tools for exploring the hypostasis of life. It is usually expensive, time consuming and aimless to research in ncRNAs with experimental methods, which are essential for understanding ncRNAs. However, computational tools for researching ncRNAs are sorely possible and needed, with successive accomplishment of sequencing of various genomes and the establishment and enrichment of corresponding databases. This dissertation focuses on the theme of classic computational problems related with ncRNAs, including sequence-structure alignment, secondary structure prediction and identification of ncRNAs genes. The main contents and contributions of the dissertation are summarized as follows:1. The research on ncRNAs sequence-structure alignment. Sequences alignment is one of the classic problems in computational molecule biology. NcRNAs molecules are highly conserved in secondary structure but share little sequence similarity, therefore the traditional methods of multiple alignments fail to meet the needs of analysis involved with ncRNAs. This in turn means that the computation of reliable ncRNAs alignments must take structural information into account, which results in visibly increase in computational complexity. To deal with this problem, we employ the quantum genetic algorithm (QGA) which is based on the concept and principles of quantum computing such as a quantum bit and superposition of states. Moreover, we design a new full interference pair crossover operator and construct a fitness function, which consider information of sequences and structures simultaneously. Experiments on BRAlibase show that QGA performs well without premature convergence, and have shorter optimization time and higher solution quality compared to the conventional genetic algorithm.2. The research on ncRNAs secondary structure prediction. The secondary structures of ncRNAs, which determine their function, are crucial to related researches. Most of the traditional methods for ncRNAs secondary structure prediction use optimization algorithm, which suffers from high space and time complexity. Given aligned ncRNA sequences, we consider secondary structure prediction as a classification problem: to judge whether any two columns in the alignment correspond to a base pair using provided information by alignment. After analyzing various computational measures used in the existing prediction methods, the classification capability of those measures was compared quantitatively using filter and wrapper approach with combination of support vector machine (SVM) classifier. As a result, an optimum subset of computational measures, including thermodynamic, covariation and phylogenetic information, was selected for predicting RNA secondary structure by classification. Our method used SVM classifier with selected measures and the rules of stem combination to predict ncRNA secondary structure, which represent a new methodology for future ncRNA secondary structure prediction approaches.3. The research on the precursors of microRNA genes. The universal computational methods to identify ncRNA genes are far from satisfactory because ncRNA genes have less signals in comparison with protein coding genes, and moreover, they are widely distributed in genome and have various varieties in kind and length. As one of important regulatory ncRNAs, microRNA plays crucial roles in lots of life processes. Identifying microRNA precursors (pre-miRNAs) is a primary step for analysis problems involved with microRNA genes. While the hairpin secondary structure is a distinguishing feature of pre-miRNAs, there are a large number of sequences folding into them, which are not pre-miRNAs. Focused on hairpin secondary structure, we research prediction methods to distinguish pre-miRNA hairpins from pre-miRNA-like pseudo hairpins. Firstly, 25 novel local features for identifying hairpin structures of pre-miRNAs were proposed by pulling hairpin of RNA, which captures characteristics on not only the stem but also bulge and interior loop in structure. The tests show that the classifier with new features outperformed the 3SVM. Secondly, to characterize detailed information of pre-miRNA hairpin, four topological indices weighted by free energy are defined. Exploration on these indices shows that they could not only characterize topological connection of elements, but also depict composition and relative position of bases in structure. Finally, we select 23 features from 52 candidates, which include 4 new topological indices, as feature set to identify pre-miRNA. And moreover, through handling of class imbalance problem in the datasets, an effective classifier model for pre-miRNA is developed.
Keywords/Search Tags:ncRNA sequence-structure alignment, Quantum Genetic Algorithms, ncRNA secondary structure predictions, features selection, Support Vector Machine, precursor microRNA identification, topological feature weighted by free energy
PDF Full Text Request
Related items