Font Size: a A A

Research On Reconstruction Method Of Multiscale Chromatin Structure

Posted on:2024-07-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y GongFull Text:PDF
GTID:1520306914474274Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The structure of biological chromatin is closely related to the expression,transcription and regulation of genes.Using microscopy to observe the spatial structure of chromatin has a low resolution and cannot precisely locate the relationship between the spatial structure of chromatin and gene function.The quick development of high-throughput chromosome conformation capture(Hi-C)technology provides high-resolution chromatin interaction data for studying multiscale chromatin spatial structure,thereby promoting complex diseases and cancerrelated studies.Based on the Hi-C contact marix at different resolution,chromatin structure at different scales can be studied at different resolutions,including chromatin Loops,topologically associated domains(TAD),and global chromatin three-dimensional spatial structure.However,for the above researches,there still exist the following problems:(1)The proportion of regulatory chromatin interactions identified directly from Hi-C data is too low;(2)The low accuracy and high complexity of existing algorithms for TAD identification and chromatin three-dimensional space reconstruction used in high-resolution Hi-C data.Aiming at the above problems,this paper uses deep learning,density-based clustering,nonlinear dimensionality reduction,and other computational methods to reconstruct the chromatin structure at different scales for high-resolution Hi-C data.The main research contents of this paper are as follows:1)An algorithm for identifying regulatory chromatin interactions based on multimodal feature fusion(MINE-Loop).In order to realize the recognition of high-proportion regulatory chromatin interaction(RCI)from low-resolution HiC data,we propose a neural network(MINE-Loop)that fuses multimodal features of Hi-C data and epigenome data to solve the problem that RCI features cannot be directly learned from sparse Hi-C contact matrices.This method first analyzes the mechanism between epigenome data features and RCI in Hi-C data,and then uses the multi-modal features of Hi-C data and epigenome data as the input of the network,and uses the Hi-C data masked by the epigenome data as the target Hi-C data for training.By changing the type of epigenomic data used to generate the target Hi-C data,the different types of RCI characteristics in Hi-C data are enhanced.The results show that for different target data,a greater number of active(or repressive)chromatin interactions can be predicted,than identifying regulatory chromatin interactions directly from the raw Hi-C data.2)An algorithm for the identification of topologically associated domains based on hierarchical density-based clustering(CASPIAN).Aiming at the problem of too many parameters,failing to identify TAD from high-resolution Hi-C contact matrix in existing TAD identification algorithms,a novel TAD boundary identification method based on hierarchical density-based clustering algorithm(CASPIAN)is proposed.Firstly,the raw Hi-C data is denoised and normalized to obtain the normalized Hi-C contact matrix.Then,by analyzing the signal distribution characteristics of TAD in the Hi-C contact matrix,the distance between pairs of genomic loci is calculated based on Minowski distance metric,and the genomic loci are clustered based on hierarchical density-based clustering method.Finally,TAD is identified based on the clustering results.CASPIAN enables TAD boundary identification of Hi-C contact matrices at both low and high resolution.CASPIAN quantifies the proportion of CTCF,H3K4me3,and other factors anchored to the chromatin TAD boundary,and can identify the euchromatinassociated TAD and heterochromatin-associated TAD evenly.3)Chromatin three-dimensional structure reconstruction algorithms based on nonlinear dimension reduction and divide-and-conquer strategy.In this paper,the relationship between the distance of genomic loci in the Hi-C contact matrix and the three-dimensional structure of chromatin is firstly analyzed.The distance transformation of the Hi-C contact matrix is carried out to obtain the distance matrixased based on the shortest path algorithm.The reconstruction of the three-dimensional structure is defined as the problem of restoring the threedimensional coordinate position from the distance matrix.To solve the problem of high time complexity and low accuracy in the reconstruction of chromatin 3D structure from high-resolution Hi-C contact matrix,firstly,the Kullback-Leibler divergence is used to measure the dissimilarity between the input distance matrix and the Euclidean distance obtained from the output 3D structure,and a lowresolution chromatin 3D structure reconstruction algorithm based on nonlinear dimensionality reduction(NeRV-3D)is proposed to realize the 3D reconstruction of low-resolution chromatin accurately.Then,to reduce the time of high-resolution 3D structure reconstruction,a high-resolution 3D structure reconstruction algorithm based on a divide-and-conquer strategy(NeRV-3D-DC)is proposed.Compared with other existing low-resolution and high-resolution 3D structure reconstruction algorithms,NeRV-3D and NeRV-3D-DC both have a higher similarity and lower RMSE errors of reconstructed structure distance,which is similar to the low-resolution 3D structure obtained by FISH technology.4)A multi-scale chromatin structure visualization system based on Hi-C data(MINE).A system named MINE based research content 1,2 and 3 including MINE-Loop,MINE-density and MINE-Viewer modules is realized to explore the relationship between the spatial density of regulatory chromatin interactions,changes of gene expression,and chromatin spatial structure.Among them,MINELoop enhances the detection of regulatory chromatin interactions;MINE-Density quantifies the spatial density of regulatory chromatin interactions(SD-RCI)identified by MINE-Loop within different chromatin conformations;MINE-Viewer is used for 3D visualization of the specific factor space density.For the application of MINE system,firstly,the concepts of developed active hub(or repressive hub),developing active hub(or repressive hub)are proposed based on SD-RCI.Then,MINE is applied to analyze the Hi-C data generated from the HeLa cells before and after treatment using 1,6-hexanediol,to quantitatively describe the change of chromatin structure.The MINE system enables quantitative studies on different aspects of chromatin conformation and regulatory activity during cell differentiation.
Keywords/Search Tags:Hi-C data, Multiscale chromatin structure, Multimodal feature fusion, Hierarchical density-based clustering, Nonlinear dimensionality reduction
PDF Full Text Request
Related items