Font Size: a A A

Data Integration Optimization Model For 3D Genome Structure

Posted on:2021-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:D S YanFull Text:PDF
GTID:2370330647957401Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In the nuclei of humans and other mammals,chromosomes do not exist in a linear arrangement,but form a multi-level three-dimensional structure through folding and winding.By forming this three-dimensional structure state,genes and regulatory elements that are farther apart in linear distance can have varying degrees of proximity in three-dimensional space,and this spatial proximity is closely related to the mode of gene expression regulation.Three-dimensional genomics,as an important way to explore the relationship between genome functional elements and gene regulation,has achieved vigorous development.With the continuous development of various sequencing methods for 3D genomes,the hierarchical structure of 3D genomes,such as A/B compartments,topologically associated domains,and chromatin loops were revealed and intensively studied.Hi-C technology has been widely used as a high-throughput chromatin conformation capture technology.Hi-C technology can reveal the structural features of the 3D genome at different scales.However,the resolution of most published Hi-C data is relatively low due to the technology limitation.The low-resolution Hi-C data is too rough to accurately and comprehensively study high-level genome features.Recently,researchers have developed some computational methods to inferhigh-resolution Hi-C interaction matrices from low-resolution Hi-C data using deep learning methods.However,these methods are purely to improve the resolution of Hi-C data and do not consider other data sources and downstream effect in gene regulation.The integrated modeling of multi-omics data allows us to look at a complete life activity process from a systematic perspective and discover interesting scientific laws from it.This thesis is based on optimization method to integrate multi-omics data according to their data characteristics to improve Hi-C data resolution.The main work include:(1)We developed a low-rank decomposition optimization method to integrate-multiple resolution Hi-C data and multiple replicate information to improve the resolution of Hi-C data.Due to the individual specificity of Hi-C data,multiple replicate Hi-C data are generally sequenced during sequencing,each sample The Hi-C data has multiple resolution chromatin interaction networks.Therefore,we propose a multi-sample and multi-resolution network optimization model,which completes the improve resolution of Hi-C data by extracting information from multiple replicate interaction matrices with different resolutions.(2)We developed optimization model to integrate other multi-omics data to further boost the Hi-C data resolution.We integrate three omics data: Hi-C data,ATAC-seq data,and RNA-seq data.Using Hi-C data as the basic data,through combining with the other two kinds of data,optimal model for multi-omics dataintegration to improve resolution are proposed.We tested multi-omics integration model on public data and obtained promising results.(3)We applied our optimization method to study Hi-C data for high altitude adaption.Plateau adaptation is a complex trait.The study of the mechanism of plateau adaptability is of great significance to evolution.We focus on the Hi-C data of plateau adaptability,from the acquisition,processing,analysis,and application of data.The optimal model for multi-omics data integration to improve resolution improves the data quality and then analyzes it,making a relatively complete process.By comparing the differences between Tibetans and Hans at multiple structural levels,it explains some mechanisms of Tibetans' plateau adaptability from the perspective of three-dimensional genomes.
Keywords/Search Tags:Optimization, 3D genomics structure, high altitude adaptation, multi-omics data integration, resolution improvement
PDF Full Text Request
Related items