Font Size: a A A

Database Construction And Bioinformatics Analysis Of Plant Circular RNAs(circRNAs)

Posted on:2020-01-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q J ChuFull Text:PDF
GTID:1360330575996002Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Circular RNA(circRNA)is a kind of single-stranded,covalently closed RNA loop.CircRNAs were first found over twenty years ago,and were considered as byproduct of splicing,and were thought to be of no biological functions.With the rapid development of high-throughput sequencing techn-ology and bioinformatics technology,a large number of circRNAs from diverse organisms have been identified,from mammals such as mouse and human,plants such as rice and Arabidopisi,to fruit fly,worm and protists.Some circRNAs were proved to have biological functions,such as working as miRNA sponges to regulate the expression of miRNA targets,combining with snRNP(small nuclear ribonucleic protein)to regulate the transcription of parental genes,and acting through proteins.However,compared with circRNA researches focused on aninals,there are less publication and less research achievement in plants.Therefore,we aimed to study on plant circRNAs,and get the results as following.(1)We have set up the first plant circRNA database in the world,which was termed as PlantcircBase(http://ibi.zju.edu.cn/plantcircbase/).In the latest version,PlantcircBase collected a total of 115,171 circRNAs from 16 different plant organisms.All circRNAs were collected from publicly available publications,as well as identified by ourselves using public RNA-seq data and different kinds of circRNA detection tools.In addition,based on the annotated genomic features,at which the two back-splicing sites of a certain circRNA located.circRNAs were classified into ten types in this study.We also set up a generic system for nomenclature of circR:NAs based on their parental genes and classified types,for example,"AT1G01010_circ_g.1".Each circRNA entry in PlantcircBase contains information such as name,position,sequence,type,parental gene name,splicing signal,if on exon boundary,coding potential,conserved information,circRNA-miRNA-mRNA interaction,and so on.Apart from the entry information of each circRNA.PlantcircBase provide other tools,such as browsing,searching by key words or sequences,downloading,visualizing the structures of circRNAs,genome browser of circRNAs and so on.(2)By analyzing circRNAs collected by PlantcircBase,we found plant circRNAs have characteristics as following:the expression patterns of plant circRNAs are diverse among different tissues and at different developmental stages;the genomic length distribution of plant circRNAs is range from 100bp to 500bp;most plant circRNAs are generated from annotated genes;alternative splicing of plant circRNAs is widespread;plant circRNAs have diverse non-canonical splicing signals.(3)We developed two bioinformatics tools for plant circRNAs.The first one is to predict the formation mechanisms of circRNAs:Based on the three most well known formation mechanisms of circRNAs,i.e.RBP-driven circularization,circularization by reverse complement sequences.and circularization by lariat structure,we developed a software for predicting the formation mechanisms of circRNAs(available at https://github.com/qjchu/BMPcirc).By applying our software to circRNAs identified from human,mouse,worm.Arabidopsis and rice,we found that our software could successfully output the prediction results of some circRNAs(for those can not be predicted by our software,one possible reason is that the formation mechanisms of these circRNAs were not included in three well known mechanisms),and potential formation mechanisms of most circRNAs were RBP-driven circularization.The second one is to detect fusion circRNA:The main principle of our fusion circRNA identification tool(available at https://github.com/qichu/find_fcRNA)is based on pseudo-reference,which was generated by extracting exon sequences of fusion gene pairs and arranging them in a stagger manner.Then RNA-Seq data will be mapped to pseudo-references and find the back-splicing sites of fusion circ RNAs.By applying our software to simulated data from human and rice,the sensitivities were both over 70%and precisions were both over 90%.By applying to rRNA-depleted RNA-seq data from human,worm and rice,our software also performed well and found certain number of candicate fusion circRNAs.
Keywords/Search Tags:plant circRNAs and their characteristics, PlantcircBase, formation mechanism, fusion-circRNA, bioinformatics tools
PDF Full Text Request
Related items