Font Size: a A A

Noncoding RNA Identification Based On Topology Secondary Structure And Reading Frame In Organelle Genome Level

Posted on:2017-03-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y WuFull Text:PDF
GTID:1220330485966595Subject:Biophysics
Abstract/Summary:PDF Full Text Request
With the rapid development of functional genomics, the study of the non-coding transcription product’ function attracts more and more people’s attention. More than 98% of human genome does not encode any proteins, but it was unexpectedly found that it is transcribed into RNAs. Non-coding RNA (ncRNA) genes make transcripts as same as the encoding genes, and ncRNAs directly function as RNAs rather than serve as blueprints for proteins. The experimental results show that some ncRNAs act as molecular switches that regulate gene expression, play a key role in the translation process of proteins. And in recent years, it is found that non-coding RNAs are relate to human diseases, DNA damage repair and plant stress response.Data on the RNA world has been accumulating. Short or long nuclear-encoded ncRNAs regulating mitochondrial functions or mitochondrial dynamics have been identified. A number of miRNAs control nuclear genes involved in organelle functions and thus contribute to regulate mitochondrial metabolism and morphology, as well as mitophagy and mitochondrion-mediated apoptosis.Conversely, information on ncRNAs in organelle genetic compartments has long remained limited. With the study of ncRNA in organelle genome level increasingly deepening, it found that the recognition of ncRNAs in organelle genome level is beneficial for the further understanding of ncRNA functions from different organelles genomes. In this research, we systematically studied the ncRNA dataset construction, the characteristic extraction and the optimization of feature parameter in organelle genomes level. Meanwhile, we established the prediction algorithm for recognition of non-coding RNAs from different organelle genomes, and generalized the method.The accumulation of "omics" data established that organismal complexity scales not with gene number but with gene regulation. Small interfering RNAs (siRNAs) and microRNAs (miRNAs) are identified as major actors in depicting most biological functions. Finally, taking microRNA as an example, to understand how the non-coding RNA and its target had regulatory effect on breast progression and tumor differentiation. Because the different microRNAs maybe have cooperativity when they have same target genes, the features of 15 microRNA primary sequences and the gene expression profiles of target genes in tumor and normal breast are analyzed. The main research topics summarized as follows:1. Based on the experimental data, a benchmark dataset about organelle genomes of ncRNA sequences is firstly built from NONCODE v3.0. Furthermore, in order to estimate the effectiveness of the prediction method and the effect of the sequence identity on predicting results, the 361 ncRNAs with sequence identity80% are chosen by a culling program CD-HIT. Based on the physical and chemical properties of four bases, the physicochemical features of ncRNA sequences from different organelles genome are discussed. Considering reading frame may play an important role in determining the organelle genome of ncRNAs, the n-mer components, the triplets of structure-sequence mode and degeneracy of genetic codons under reading frames are selected as the features of ncRNAs. The calculated results from different reading frames and non-reading frame indicate that the first reading frame (W1) is the best effective reading frame for recognizing the ncRNA locations.2. Because the structural information and the motif information of ncRNAs can reflect their more realistic spatial conformation and the conservative local structure, the topology secondary structure and the conservative motif are firstly selected as feature parameters for ncRNA’ identification in organelle genome level. In order to decrease the number of features,two different dimension reductive methods are proposed:the original features are mapped to lower dimension; the incremental feature selection (IFS) method-based the Maximum Relevance Minimum Redundancy (mRMR) feature selection technique is utilized to optimize feature set. The increment of diversity classifier (ID), K-nearest neighbor classifier (KNN) and support vector machine (SVM) are integrated to build the fusion algorithms:the increment of diversity combining support vector machine (ID-SVM), the improved K-minimum increment of diversity classifier (iK-MID)and the improved K-nearest neighbor classifier (iKNN). By comparing the various algorithms, the more effectively theoretical model for ncRNA identification in organelles genome level is explored.3. The sequences feature of 15 miRNAs from specific miRNA gene cluster (hsa-miR-17-92 gene cluster) and its paralogs clusters, and the expression of target genes in different group level are studied by bioinformatics tools. Meanwhile, the regulation mechanism of miRNA to its downstream gene is briefly explained by feedback mechanism. These results can be useful for relative experiment.
Keywords/Search Tags:Non-coding RNAs, Organelle genomes, The open reading frame, The topological secondary structure, MicroRNA cluster, Function and pathway enrichment analysis
PDF Full Text Request
Related items