Font Size: a A A

A Supervised Learning Based Method For Predicting Chromatin Topologically Associating Domains

Posted on:2021-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q X ZhaoFull Text:PDF
GTID:2480306050468364Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
It is of great significance to study the three-dimensional structure of chromatin for understanding the regulation mechanism of gene expression,and it is helpful for the research of cell function,biological development,disease occurrence and other fields.With the development of High-through chromosome conformation capture(Hi-C)technology,a lot of Hi-C data have been obtained.Hi-C data describes the chromatin interaction frequency in the whole genome,which makes it possible to study the three-dimensional structure of chromatin in the whole genome.3D genome studies have found that chromatin has a hierarchical spatial structure,from macro to micro,which are chromatin domain,a / b compartment,topologically associating domain(TAD)and chromatin loop.Moreover,different chromatin structures play an important role in gene expression and regulation.TAD has been found to have important biological functions in the process of heredity and development,which has attracted wide attention of biologists.Therefore,it is of great significance to obtain a variety of cell lines and tissues and Hi-C data under disease conditions at low cost to detect TAD correctly.In order to study how chromatin interacts from one-dimensional linear sequence to form three-dimensional spatial structure,and the spatial distribution pattern of this three-dimensional structure in the global scope of chromatin,researchers designed different structural pattern detection methods based on their respective domain background.For example,the DI and Top Dom methods define one-dimensional indicators of TAD based on the observation of Hi-C data and then detect TAD.The IC-Finder and MSTD methods detect TAD based on clustering theory.But at present,the methods of detecting TAD is mainly based on unsupervised learning.In this paper,a method of TAD prediction based on supervised learning is proposed,which uses functional genomics data as feature to predict the boundary of chromatin's topologically associating domain.The method was applied to gm12878,IMR90,NHEK and K562 cell lines.The chromatin immunoprecipitation sequencing(Ch IP-Seq)data and Hi-C data of these four cell lines were collected on several databases such as the Encyclopedia of DNA Elements(ENCODE).Three unsupervised learning methods,Hi CDB,Top Dom and MSTD,were used to detect the TAD boundary from Hi-C data,positive and negative samples were extracted,sample features were extracted from the Ch IP-Seq data,and supervised learning model was trained to divide the samples and cross validation method was used for parameter selection and method evaluation.The experimental results show that one-dimensional functional genomics data has a good prediction ability for TAD.Moreover,the method trained from one cell line can successfully predict the TAD boundary of another cell line,which has a good ability of cross cell generalization.It is also found that epigenetic modification data,structural protein binding data and chromatin opening data play an important role in TAD boundary prediction.
Keywords/Search Tags:3D genome, Chromatin Topologically Associating Domains, Hi-C data, Ch IP-Seq data
PDF Full Text Request
Related items