Font Size: a A A

Identification Of Chromatin Loops Based On Multi-Omics Features

Posted on:2024-07-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:L TangFull Text:PDF
GTID:1520307310982319Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the nuclei of eukaryotic cells,chromatin is folded into complex three-dimensional structures,and the presence of Loop structures brings genes or regulatory factors that are linearly distant closer together in spatial conformation.Loop structures play an extremely important role in maintaining normal gene expression levels,marking cell specificity,and regulating cell growth and development.Analyzing Loop structures and thoroughly investigating their regulatory functions can help understand the molecular mechanisms of disease onset and identify potential therapeutic targets,providing significant implications for disease diagnosis,treatment,and other related biological research.In this study,we investigate the analysis and disease association of Loop structures based on massive and complex multi-omics data from perspectives such as tensor decomposition and model fusion.The main contributions and innovations of this study are summarized as follows:(1)To address the issues of low accuracy in current unsupervised Loop analysis algorithms and their inapplicability to various data types,we propose an unsupervised Loop structure analysis method based on a multi-strategy combination,PCEC-Loop.By integrating features such as Loop extrusion-related protein binding,histone modification,and base sequence orientation,a feature tensor is constructed from different dimensions.PCEC-Loop combines strategies such as peak detection based on Poisson distribution,base sequence matching scores,and tensor decomposition to calculate the association scores between Loop anchor points.Compared to existing unsupervised methods,PCEC-Loop effectively improves the predictive performance of Loop structures and can simultaneously identify Loops with both activating and inhibitory characteristics.(2)Due to the high cost,high failure rate,and long duration of 3D genomics experiments,carrying out high-resolution chromatin conformation capture experiments for unknown species and cell lines has become a challenge.To address the issue that current algorithms cannot predict Loops across cell lines and species,we propose a Loop structure analysis method based on multi-omics integration learning,LoopPredictor.The method constructs a classifier based on a multi-task framework to identify anchor point pairs and effectively classifies Loop functions with a small amount of input data.Using an adaptive regression model,Loop confidence prediction is achieved in the absence of 3D genomic data.Based on this,a 3D genomics Loop structure analysis and visualization platform is constructed to help biologists explore Loop structures and related molecular regulatory mechanisms.(3)To address the issue that current single 3D genomics experiments can only capture one type of protein-mediated Loop,we propose a protein-mediated Loop structure analysis method based on model fusion,FusNet.FusNet consists of feature extractors,predictors,and model fusion layers.The feature extractor employs convolutional neural networks for dimensionality reduction and information extraction of genomic sequences and open chromatin region features.By combining gradient boosteddecision trees,support vector classification,and nearest K-neighbor models to construct base predictors,a model fusion layer is established using stacked training patterns.The prediction results of each base model are used as new features for training the fusion model and predicting the final Loop formation probability.FusNet can effectively predict various protein-mediated Loops,and its prediction results have a strong correlation with pathogenic mutation sites.(4)To address the unclear relationship between 3D genomic structure changes and disease occurrence mechanisms,we first analyze pathogenic mutations in different hierarchical structures and propose a 3D genomic hierarchical structure and disease association analysis method,3DFunc.3DFunc integrates gene expression levels from 35 cell lines and chromatin conformation capture data from 33 cell lines,using nonlinear least squares estimation for modeling.It can accurately score pathogenic mutation-target gene pairs in different 3D hierarchical structures.In addition,by integrating a large amount of literature on mutations and diseases related to hierarchical structures,3DFunc re-scores and ranks mutation-target gene pairs,constructing a 3D genomic hierarchical structure and disease association database,3DGeOD,providing new insights for disease mechanism research.
Keywords/Search Tags:3D Genomics, Loop, Chromatin Layer, Fusion model, Multi-omics
PDF Full Text Request
Related items