Font Size: a A A

Identification And Analysis Of LncRNAs In Major Cereal Species

Posted on:2019-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:G F WangFull Text:PDF
GTID:2370330545985520Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Long non-coding RNA(lncRNA)refers to the transcripts that are greater than 200 nucleotides in length and do not encode a protein.The lncRNA plays a very important role in the regulation of gene expression in eukaryotes.Many lncRNAs are considered to be involved not only in the regulation of mRNA expression but also in the growth and development of organisms and implicated in a number of diseases.LncRNAs have seen a large number of functional and genomic studies in mammals,compared with the relative lack of lncRNAs in plants.With the progress of sequencing technology,the technology for predicting lncRNA has been significantly improved.With a much longer sequencing length,the third-generation sequencing technology has greatly facilitated and accelerated thelncRNA research.We can now directly obtain the complete transcript sequences and skip the error-prone process of transcript assemblies which has been a leading cause of the errors in the splicing analyses of transcriptome data in the past.In this study,we used the PacBio single-molecule real-time sequencing technology to sequence the transcriptomes of four gramineous crops of rice,Setaria,sorghum and Brachypodium Next,bioinformatics methods were used to predict lncRNAs of four gramineous plants,of which 260 lncRNAs were predicted in rice,647 lncRNAs were predicted in Brachypodium 437 lncRNAs were predicted in sorghum,1873 lncRNAs were predicted in rice.We counted the length distribution of lncRNA in the four gramineous plants and found that the lncRNA length of sorghum was the longest of the four gramineous plants with an average length of 1.2 kb and the lncRNA length of Brachypodium was four The shortest among the plants,the average length of 0.8 kb.Observing the lncRNAs of these fourspecies,we found that most lncRNAs are single exons.We clustered a total of 4863 lncRNAs from four species using CD-HIT software,resulting in a total of 412 clusters.We have performed cross-species analyses of these identified lncRNA families on the basis of these sequence clusters.We also counted the distribution of lncRNAs in four species of rice,Setaria,sorghum and Brachypodium The results showed that the distribution of lncRNAs in these four species was similar on the chromosomes.All of them were enriched in the pericentromeric regions.We hypothesized that this might be related to the low repeat region and the low GC/CHG methylation region.In addition,we conducted further studies on the selection of millet among the four species.We classify the lncRNAs of millet based on their position on their reference genome.Of these,64%were frome the intergenic region,18%frome the antisense strand,16%frome the intending strand,and 2%frome the intron region.Finally,in order to efficiently query the data obtained from the above analysis and research as well as to make the lncRNA resources more widely accessible to the scientific community,we have created a lncRNA database for Gramineae:DGL.The database URL is:http://lncma.camdb.org.The database collected IncRNA from four gramineous plants of rice,Setaria,sorghum and Brachypodium,and provided annotations of the corresponding lncRNAs.It is also worth mentioning that this database also provides the ability to view information on chromosome distributions of lncRNAs frome rice,Setaria,sorghum and Brachypodium,which allows us to facilitate further detailed studies of related lncRNAs.In addition,the database provides the blast function to query the associated IncRNA annotation based on sequence homology.In order to facilitate the display of plant genes such as rice,Setaria,sorghum and Brachypodium,we also added the Jbrowser to the database to visualize the genomic neighborhood of the IncRNA loci.Using the Jbrowser,we can view the IncRNA transcript information of rice,Setaria,sorghum,Brachypodium,and exon/intron boundaries for the nearby protein-coding genes.
Keywords/Search Tags:LncRNA, Bioinformatics, Sequencing, Database
PDF Full Text Request
Related items