Font Size: a A A

Identification And Annotation Of Genomic Regulatory Elements

Posted on:2020-02-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:C RenFull Text:PDF
GTID:1360330599452418Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The rapid development of sequencing technology has revolutionized biological research.The application of next-generation sequencing technology first enables researchers to perform large-scale sequencing at lower cost and high-speed sequencing of whole genomes.This has also greatly changed the research methods and research methods of researchers and promoted the development of multi-omics research.In addition,the emergence of a variety of sequencing methods for specific genomic information has made many high-precision analytical studies possible.The generation of single-cell sequencing has enabled researchers to study their characteristics at the level of individual cells,while also enabling in-depth research on germ cells.Based on the rapid development of sequencing technology,many important scientific research projects have become a reality.The ENCODE focuses on the identification and annotation of genomic components from multiple species,providing a large number of standard samples and reliable experimental data covering important components and characterization in multiple genomes.Roadmap Project targets dozens of apparent signals in a number of different tissues,cell lines,and stages of the human genome to provide specificity and commonality to the Epigenomic set of human genomes.TCGA project collects a large number of samples of cancer and cancer patients for sequencing,covering more than 30 types of cancer.In addition to large-scale sequencing,the TCGA project also produces a large amount of analytical data including expression levels,mutation sites,etc.,providing a large amount of analytical data for cancer-related researchers.Large multi-omics data and corresponding large-scale data projects make multi-omics integration analysis possible.The coverage of multi-omics integration analysis is becoming extensive,and the depiction of complex relationships is becoming more and more detailed.Through multi-omics analysis,the exploration and decryption of biological and medical problems at the system level is becoming feasible.The emergence of large and diverse multi-omics data also poses new challenges for bioinformatics.The first problem is how to effectively use these massive data.The potential connections and interactions embedded in multi-omics data are also new issues for bioinformatics.In the field of bioinformatics,there is an urgent need for new ways to help researchers integrate analysis of multi-omics analysis from the integrated level.At the same time,omics data of different natures also requires bioinformatics researchers to develop new methods to characterize their characteristics.Starting from the development of bioinformatics methods,this paper proposes a variety of analytical methods based on the common problems in multi-omics data and genomic regulatory elements,and relies on multi-omics data integration analysis in public big data and the experimental data of the above and targeted design.First,this paper develops new methods and software for transcription factor binding site recognition problems common in transcription factor correlation analysis.After that,this paper applied multi-omics integration analysis method on public big data to characterize and predict the nature and function of a special type of genomic regulatory element enhancer RNA.By further expanding the multi-omics association analysis method,a large amount of data is generated for the potential function and function of long-chain non-coding RNA,and two reliable long-chain non-coding RNA interaction relation databases are developed accordingly.After establishing a mature multi-omics integrated analysis method,this method has been applied to experimental data of multiple targeted designs.By applying the experimental data of mouse early embryos,the mechanism behind the phenotype of early embryonic development defects induced by maternal obesity and the regulation mechanism behind the imbalance of allele expression in mouse early embryos were studied.From the methodological point of view,this paper uses a variety of different methods and means to study the identification and annotation of regulatory elements in the genome.It includes the development and implementation of identification methods for specific regulatory elements,as well as the establishment and application of integrated analysis processes for multi-omics data.The research work in this paper includes the following aspects:First,a new transcription factor binding site recognition algorithm was developed.In order to make breakthroughs in the transcription factor binding site recognition algorithm,this study achieved higher accuracy and sensitivity by integrating and reconstructing the existing five transcription factor binding site recognition algorithms at the base level.The method iForm takes full advantage of the solid foundation of existing research and identifies transcription factor binding sites on genomic sequences based on position weight matrix.By integrating the recognition results of the five existing methods using the chi-square test,we obtained a more accurate and sensitive identification index.By using a variety of detection indicators to test our prediction method on the gold standard set,we found that this method could exceed these methods in accuracy and sensitivity compared with the existing five methods.In addition,we have used the iForm method to identify new gold sets in a variety of tissues and cell lines,providing a database for subsequent research.Secondly,based on massive multi-omics analysis,enhancer RNAs in various tissues and cell lines were identified and characterized,and its regulatory function as a potential regulatory element was predicted based on its properties.Based on the epigenetic map plan of histone data and RNA-seq data,the study identified corresponding enhancer RNAs in up to 50 different tissues and cell lines.Later,various properties of these enhancer RNAs were characterized and found to be significantly different from other genomic components.The function of the enhancer RNA was predicted after the study.Based on the importance of secondary structure for the function of RNA molecules,this study predicted the potential regulatory capacity of enhancers.By further linking the secondary structural changes of enhancer RNA to disease-causing gene mutations,this study predicted the potential regulatory role of enhancer RNA in a variety of immune diseases.Finally,through the aggregation and mining of existing results,this study proposes a model for the regulation of enhancer RNA on other regulatory elements of the genome.Then,by further expanding the application scenario of the multi-omics integrated analysis method,the annotation and prediction of the regulation of long-chain non-coding RNA are realized,and the related results are summarized and reorganized,and two main concerns and long-chain non-coding are put on the line.A database of interactions and regulatory relationships between RNA and other genomic components in a variety of cancers and other diseases.Through the use of multiple quantitative analysis methods,this study successfully evaluated the interaction between long-chain non-coding RNA and other genomic regulatory elements.This study focused on the relationship between the secondary structure of long-chain non-coding RNA and disease-causing genomic mutations,the relationship between RNA sequence and protein binding,and the relationship between its expression and co-expression of other genes.The database Lnc2 Catlas was developed and released by quantitatively evaluating the three relationships and summarizing the results.Further,in order to meet the researchers' need for the reliability of experimentally validated long-chain non-coding RNA interactions,this study further combines word segmentation systems and manual labeling to publish existing long-chain non-coding RNA interactions.The literature was labeled and classified.In this way,a database LIVE containing a large number of experimentally verified long-chain non-coding RNA interactions was introduced.Then,through the analysis of methylation regulation changes and protein level changes in mouse early embryos,the regulatory genes and related mechanisms behind the early embryonic development defects in mice caused by maternal obesity were identified.By analyzing the differences in proteomes in early embryos of obese mice,this study identified a series of candidate genes.Stella protein was obtained from a series of candidate genes by combining methylation group differences analysis in early embryos.Further studies on the Stella protein revealed that its protective effect on the demethylation process in mouse early embryos is one of the important factors to ensure the early development of mouse embryos.The loss of Stella in early mouse embryos caused by obese mothers is a direct cause of defective phenotypes in early embryonic development in mice.Further,by constructing a model of Stella-deficient mice,this study investigated in depth the process of Stella protein changes in methylation levels from mouse oocytes to early embryos.Through the integrated analysis of multi-omics,this study explored and explained the mechanism behind the early embryonic defects in mice caused by obese mothers,and made important contributions to the research and treatment of fetal defects caused by human obesity..Finally,the multi-omics integration analysis method was used to analyze the allelic imbalance in mouse early embryos,and to explore the main regulatory mechanisms and regulatory elements behind it.Based on a model of reciprocal cross-breeding mice,this study identified and characterized allelic imbalances in mouse early embryos.At the same time,through the comparison of the bias of the allelic imbalance in the reciprocal crossover,this study reveals the evolution process of the main regulatory factors behind the allelic imbalance dynamics,pointing out the main regulatory factors from the maternal factors.Transferred to random factors.Through the combination of transcriptional regulation network analysis,a variety of transcription factors consistent with the allelic imbalance and expression patterns were discovered.Finally,by combining and integrating the existing results,a model describing the dynamic changes of the allelic imbalance in the early embryos of mice and the changes of the regulatory factors behind them were proposed.
Keywords/Search Tags:multi-omic analysis, genomic regulation elements, methylation, transcription factor, long noncoding RNA, enhancer RNA, allelic imbalance gene expression
PDF Full Text Request
Related items