Font Size: a A A

High-throughput Screening And Annotation Platform Building Of Lncrnas Related To Mouse Brain Development

Posted on:2016-08-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LvFull Text:PDF
GTID:1220330479978857Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Long non-coding RNAs(lnc RNAs) are non-coding RNAs with length larger than 200 nt, which play important roles in biological processes including embryonic development, cancer, pain and inflammation. However, the number of lnc RNAs in mouse in current databases is too few, let alone those that are functionally annotated. Brain is the major organ where lnc RNAs are expressed and the prediction of brain expressed lnc RNAs is important for comprehensive identification of mouse lnc RNAs related to brain development and the understanding of the roles of lnc RNAs in brain development. In addition, the integration and annotation of predicted and known lnc RNAs and store them into a specialized database is important for standadization and reusage of lnc RNAs. The project of Encyclopedia of DNA Elements in mouse produced a large amount of high-throughput data including RNA-Sequencing(RNA-Seq) and Chromatin Immuno Precipitation Sequencing in a large number of tissues and cell lines, which provides a novel point of view for novel lnc RNA prediction. Therefore, a large number of RNA-Seq data in multiple tissues and cell lines were collected. Novel lnc RNAs were identified and characterized by genomic, transcriptomic, epigenomic and functional genomic aspects to prove the validity based on these data. Brain development related lnc RNAs were screened based on feature selection from a model. Integration of known and predicted lnc RNAs from large-scale RNA-Seq data, followed by lnc RNA annotation platform building and analytical tool development would facilitate the usage for researchers.In this thesis, the pipeline of RNA-Seq is optimized to screen for embryonic brain development related lnc RNAs, including intergenic, intronic and cis-antisense types. The characterization of novel lnc RNAs related to embryonic brain dvelopment compared with known lnc RNAs and protein-coding transcripts by using genomic, transcriptomic, epigenomic and functional genomic approaches revealed that the gene structure of novel lnc RNAs is relatively intact, and novel lnc RNAs are demonstrated to own relatively low protein-coding potential, high tissue specificity comparable with known lnc RNAs, and be related to classical chromatin modifications. Function enrichment analysis and RNA interference based analysis indicated that embryonic development related lnc RNAs prefer to function by regulating potential brain development and binding to transcription factors. Furthermore, randomly chosen lnc RNAs are tested experimentally to have relatively high tissue specificity and lnc RNAs may be regulated by imprinting mechanisms.Secondly, LASSO regularized Logistic regression model is used to screen for genomic and epigenomic differences between lnc RNAs and protein-coding transcripts. The identified differential features are considered to relate to brain development and were used to screen for brain development related lnc RNAs, due to the usage of chromatin modification data from three developmental stages. Ten-fold cross-validation and testing with independent test data showed that feature selection model is of high performance and the performance is comparable with models of genomics only features and chromatin only features, suggesting that few features play major roles in prediction of lnc RNAs. Candidate lnc RNAs predicted by RNA-Seq from three developmental stages were further filtered using feature selection model. The validity of identification of filtered lnc RNAs related to brain development was demonstrated by characterization of lnc RNAs using genomic, transcriptomic and functional genomic approaches. lnc RNAs preferred to co-express with nearby protein-coding genes when studying on the relationship of lnc RNAs and neighboring protein-coding genes, suggesting that lnc RNAs may regulate nearby genes. The expression specificity of lnc RNAs during brain development is regulated by developmental stage specific chromatin modifications, such as H3K4me1 and H3K36me3, when analyzing specificity of lnc RNAs with model. In addition, the specificity of lnc RNAs was not regulated by genomic features, suggesting that LASSO model is capable of recognizing lnc RNAs with brain developmental stage specificity. In situ hybridization results validated the brain developmental stage specificity of randomly chosen lnc RNAs, while semi-quantitative PCR results suggested that lnc RNAs with embryonic developmental stage specificity prefer to be brain tissue specific compared to other tisssues with same developmental stages.Thirdly, the number of lnc RNAs in public databases is smaller then expected, which prompts us to merge known lnc RNA annotations and novel lnc RNAs that are predicted based on large-scale RNA-Seq data to identify over 260 000 lnc RNA transcripts, which is termed as lnc RNA collection. In the collection, novel lnc RNAs account for 75% of all lnc RNAs, which hinted that mouse lnc RNAs were not reported publicly. Analysis on the collection found that novel lnc RNAs is brain specific while not development specific during brain development. The weighted co-expression network analysis for novel lnc RNAs and known transcripts found 57 modules, of which modules with brain function were analyzed by heatmap expression profile and GO biological process enrichment. The results suggested the enrichment of brain specific genes in these modules and laid the foundation of function annotation. 12 548 lnc RNAs that were predicted to be functional were screened including 3 128 lnc RNAs that were predicted to relate to brain function based on a determined cutoff in randomization experiment. Guilt by association approach was further used to predict the function of novel lnc RNAs, the results of which indicated that the number of predicted function for novel lnc RNAs is one fold more than that obtained by weighted co-expression network approach, and the involved function terms is more than 2-fold than that obtained by weighted co-expression network approach, highlighting the efficiency of guilt by association in predicting function of lnc RNAs. Cross-validatoin and independent testing initially proved the validity of guilt by association.Lastly, 246 464 lnc RNAs expressed in brain were filtered from the lnc RNA collection. Genomic and functional annotation for the lnc RNAs reveiled that genomic annotation accounts for less than one third of all lnc RNAs; while nearly all lnc RNAs could be located at mouse genome by Entrez Gene ID, suggesting lnc RNAs can be queryed from the lncbrain annotation platform through the ID. The annotation for lnc RNAs is stored in lncbrain annotation platform that have well-designed infrastructure and visualization interface to respond to query quickly. The platform not only have precalculated genomic annotation, but also supports simultaneous genomic and functional genomic analytical modules. In addition, the usage of lncbrain platform is introduced in detail in this thesis.Taken together, a great many lnc RNAs expressed during brain development were screened and were integrated into a lnc RNA collection in this thesis. The identified lnc RNAs were filtered from the collection and were characterized by genomic, transcriptomic, epigenomic and functional genomic annotations. The platform helps researchers from screening of lnc RNAs related to brain function and also bioinformaticians performing large-scale analysis of lnc RNAs.
Keywords/Search Tags:long non-coding RNAs, brain development, RNA-Seq, annotation platform, co-expression
PDF Full Text Request
Related items