Font Size: a A A

Data Analysis And Application Of Full Length LncRNA Based On Nanopore Long Read Sequencing Technology

Posted on:2022-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:K Q YuFull Text:PDF
GTID:2480306740979559Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Long non-coding RNA(LncRNA)are a type of RNA,defined as being transcripts with lengths exceeding 200 nucleotides that are not translated into protein.Lnc RNA are widely ubiquitous in eukaryotic cells.Recent studies have shown that Lnc RNA can participate in cell regulation and communication activities by way of signal molecules,molecular decoys,protein scaffolds,etc.In addition,Lnc RNA have been found to be closely related to a variety of diseases.The identification of Lnc RNA can reveal the mechanism of action related to Lnc RNA and disease.With the continuous development of sequencing technology,the current third-generation sequencing technology with ultra-long read lengths has been widely used.it can avoid assembly errors caused by short-read second-generation sequencing and directly obtain the full length transcripts.Currently,library preparation based on poly A tail enrichment is widely developed,but the identification of transcripts that do not contain poly A still needs to be explored.In this study,hepatocellular carcinoma cell line Hep G2 and normal human hepatocyte HL7702 cell line were used as experimental samples.Poly A was enriched by Oligo(d T)to distinguish poly A+ and poly A-libraries,and Nanopore long-reads sequencing platform was used to identify reliable full-length transcripts.Lnc RNALnc RNA is filtered by definition and analysis the difference of feature,fuction and different expression between poly A+ and poly A-libraries.The conclusion provides a certain biological theoretical basis for further exploring the related functional mechanisms of Lnc RNA in tumourgenesis.The full-length Lnc RNA analysis process based on long reads sequencing is constructed in this study which can also provide some methodological guidance for identifying full-length Lnc RNAs without poly A tails.This study is mainly divided into the following three parts:1.Using Oligo(d T)to enrich the RNA containing poly A tail,the deribosomal RNA of Hep G2 liver cancer cells and HL7702 normal liver cells are divided into poly A+ and poly A-libraries.After using PCR primers to reverse transcribe into c DNA,libraries are built and reads are sequencinged in Nanopore Min ION Sequencing platform.After basecalling,low-quality sequence filtering,genome alignment,and full-length count,reliable full-length transcript information is obtained by FLAIR,and the proportion of full-length transcripts in the four libraries is compared.Poly A+ libraries have the high proportion of known full length transcripts.the proportion of NIC and NNC in poly A-library is relatively high,and a large number of transcripts exist in the intergenic region.2.Applying gffcomare to classify the transcripts of the four libraries,and the potential transcripts obtained from the classification enter the next step.The mainly fliter conditions are:the length is longer than 200 nt,with low coding potential,the ORF length is less than 300 nt,and the encoded protein is different from the existing protein domain.The reliable Lnc RNA will be compared with known Lnc RNA in the Lnc RNA database.The Lnc RNA with a comparison rate higher than 90% are labeled as known Lnc RNAs,and the others are labeled as new Lnc RNAs.After analyzing the feature of Lnc RNA,it was found that the Lnc RNA in the poly A-library had a shorter sequence and a higher proportion of single exons,and the Lnc RNAs were quite different in different libraries.3.Through the alternative splicing analysis,a large number of alternative splicing events are detected in the transcripts.After comparing the differential expression of Lnc RNA between Hep G2_poly A+ and HL7702_poly A+ libraries,as well as between Hep G2_poly A-and HL7702_poly A-libraries,functional annotations were performed on the 10 kb upstream and downstream genes of the differentially expressed Lnc RNA,including GO functional enrichment annotation and KEGG pathway analysis.It is found that the enrichment results in the poly A+ group are mostly related to cell communication and protein binding,while the enrichment results in the poly A-group are mostly related to the cell regulation of the interactions between biological macromolecules.
Keywords/Search Tags:Long non-coding RNA, Full-length sequence analysis, Nanopore sequencing
PDF Full Text Request
Related items