Font Size: a A A

Identification Of Transcriptional Regulatory Relationship Based On Multi-omics Data Analysis

Posted on:2022-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:H LuFull Text:PDF
GTID:1480306566491934Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
The central dogma of molecular biology indicates the direction in the flow of genetic information from DNA to RNA,then to protein.There are about 20,000?25,000 protein-coding genes in the human genome,the transcription and translation of which ultimately determine the structure and function of different cell types.Spatiotemporal gene expression is under strict control,which is mainly regulated by the epigenetic code and transcription factor(TF)program.Genomic variants can affect the process of expression regulation,and genomic location analysis indicates that most of the trait-and disease-associated variants lie in noncoding genomic regions.Those noncoding variants don't change the sequences of proteins,but would regulate gene expression through more complex and diverse mechanisms.Researches on gene and its expression regulation rely heavily on the advancement and application of sequencing technologies.The emergence of next generation sequencing(NGS)technology has greatly improved the development of genomics,epigenomics and transcriptomics,resulting in a massive explosion of multi-omics data.Based on these multi-omics data,researchers have been able to perform large-scale population analyses on the relationship between mutation and phenotype,which,however,mainly focus on the individual level.And how to systematically annotate the regulatory effects of variants at the tissue and cell type levels is still a major challenge in the field of genetics.In addition,traditional omics technologies usually focus on tissue samples composed of different cell types,making it difficult to explore the cellular heterogeneity.The recently emerging single-cell sequencing technologies can characterize the omics profiles of single cells accurately.Two single cell technologies,the single cell RNA sequencing(sc RNA-seq)and single cell assay for transposase-accessible chromatin using sequencing(scATAC-seq),which can measure the transcriptome and chromatin accessibility of single cells respectively in an unbiased manner,are widely used in current studies.Based on these single cell technologies,we can identify key regulatory molecules at the single cell level,and decipher the cell type-specific regulatory relationship in various biological situations.Deep learning,an advanced method in the field of machine learning(ML),provides a powerful tool for big data mining in multi-omics analysis.Through the layer by layer abstraction process in the neural network,deep learning can realize more essential description and learning of big data.With the continuous generation of multi-omics data and the development of new omics technologies,we can delineate the regulation process of gene expression in diverse situations and elucidate the molecular mechanisms under the generation and development of complex traits and diseases by mining the existing multi-omics data and utilizing new omics technologies efficiently.In this dissertation,the author focused on the identification of tissue and cell type-specific regulatory relationship.Based on the existing biological multi-omics data and the emerging single cell sequencing technologies,the author conducted the following two aspects of researches:First,in the aspect of identification of tissue type-specific regulatory relationship,the author developed a novel method for tissue type-specific prioritization of noncoding regulatory variants.Based on the tissue type-specific expression quantitative trait loci(e QTL)data from the Genotype-Tissue Expression(GTEx)project,we constructed the annotation model,named RegVar,to identify tissue type-specific regulatory variants and their target genes in 17 human tissues.RegVar utilized the deep learning algorithm to integrate and analyze the sequential,epigenetic,and evolutionary profiles of variants and their target genes.Compared with the similar existing methods or annotation tools,RegVar showed higher prediction performances under various circumstances.In order to explore the extensibility of the RegVar framework,we constructed a simplified model for identification of pathogenic variants from the Human Gene Mutation Database(HGMD).And results showed that RegVar performed equally to the current state-of-the-art methods.To facilitate the research community,we built an online webserver to annotate regulatory noncoding variants(http://regvar.cbportal.org/)based on our RegVar model.How to link functional regulatory variants with their target genes is a major challenge in the field of genomics.By robustly learning the characteristics of massive variant-gene expression associations,RegVar can help to annotate and prioritize regulatory variants and their target genes.Second,in the aspect of identification of cell type-specific regulatory relationship,the author conducted two researches:(1)Based on single cell epigenomics analysis,we identified tumor-associated macrophage(TAM)-specific key regulatory TFs in hepatocellular carcinoma(HCC).With the scATAC-seq technology,we measured the chromatin accessibility of single cells in tumor and adjacent non-tumor tissues in HCC.According to cell type-specific chromatin accessible regions,we examined the cellular heterogeneity in HCC tissues.We identified different cell types and constructed the cell type-specific chromatin accessibility atlas in HCC.We found that immune cells,like macrophages and T/NK cells,were the most abundant cell types in HCC.Through further characterization of TAM cells,we deciphered TAM-specific accessible genomic sites and important TFs.Based on chromatin accessibility,the pseudotime trajectory analysis revealed the developmental direction from newly recruited macrophages to TAMs,and identified related key TFs.This research helps to delineate the origin of TAMs in HCC and key regulatory relationship during their development,which provide useful reference for studies on new target molecules for the diagnosis and treatment of HCC.(2)Based on single cell transcriptomics analysis,we identified intestinal stem cell(ISC)and macrophage-specific key regulatory TFs during the radiation-induced intestinal injury(RIII)and repair process.With the sc RNA-seq technology,we measured the transcriptome of single cells from small intestine tissues under the homoeostatic condition and at different times after radiation stimulation.According to the expression of canonical marker genes,we identified three ISC subtypes,including LGR5~+CBCs,+4 RSCs and CLU~+rev SCs,as well as their new markers.Our results confirmed the existence of+4 RSCs and provided new specific molecules for reference.In addition,we identified resident and pro-inflammatory macrophage subtypes.Based on the single cell transcriptome,the pseudotime trajectory analysis revealed the transition process among different ISC subtypes and between macrophage subtypes and monocytes,and identified related key TFs,which may be involved in the regeneration of ISCs and the differentiation of different macrophages.Further researches on intervention of these important regulatory molecules in the future may provide new research directions for the study of intestinal repair and inflammation regulation post radiation.In summary,this dissertation focused on the identification of tissue and cell type-specific regulatory relationship.The author developed a new method for tissue type-specific annotation of noncoding regulatory variants with the deep learning algorithm,explored the application of the emerging single-cell epigenomics and transcriptomics technologies on the development and regulation of TAMs in HCC tissues,as well as ISCs and macrophages in RIII and the following repair process,respectively,and finally elucidated the corresponding cell type-specific regulatory relationship in various biological situations.
Keywords/Search Tags:gene expression regulation, multi-omics analysis, noncoding variants, deep learning, hepatocellular carcinoma, radiation-induced intestinal injury
PDF Full Text Request
Related items