Font Size: a A A

Machine Learning Predicts On Enhancer-promoter Interactions And CTCF-mediated Chromatin Loops

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:X J YuFull Text:PDF
GTID:2480306230978279Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous progress of some high-throughput sequencing technologies,people are more and more aware of the data of epigenome and sequence information,and more and more information is obtained from them.However,in order to mine a series of more meaningful information from these data,we need to develop some better algorithms,which affects the further development of biological information.At present,some traditional techniques of biological analysis data can only analyze specific data from the DNA sequence itself,so to find mutations in genes,and find rules from the same data that cause genetic mutations.However,the information found in these rules is is far from enough for the development of biology,it is difficult to break through the mining of hidden data,so that they cannot show their hidden value well.Machine learning algorithm can sort the features that we extract according to the importance level through the method of supervised learning,so to find out the features that play a key role in prediction.In the past,people only used machine learning algorithm to predict some classification and regression problems,and did not associate the biological significance with machine learning algorithm.In this paper,we perform two sets of experiments,including prediction of enhancer-promoter interactions and prediction of chromatin loops.Not only do we use machine learning algorithms to study the classification problem,we also use machine learning algorithms to analyze more biological meanings.This article collected a large amount of data on histones,binding proteins,and transcription factors.These data existed between the enhancer-promoter and the chromatin loop.Through these data,lots of features were analyzed in each cell line for classification.Being able to accurately analyze the function of the protein can well control gene expression and disease occurrence.For example,the experimental results in this paper also prove that CTCF is located between the enhancer-promoter and binds to genes and proteins.It is widely involved in the regulation of genes,and CTCF and cohesive proteins form chromatin loops.These chromatin loops are used in the treatment of diseases Plays a key role in accurately regulating gene expression and preventing gene mutations in advance,thereby improving risk control for disease treatment.Our three contributions are:(1)First,the EpPredictor method is proposed to predict enhancer-promoter interaction based on statistical characteristics and sequence characteristics.The method includes two parts.First,the statistical method is used to extract the features.The experimental results in six cell lines show that this method can effectively predict compared with the latest functional gene signaling method.(2)In the prediction of enhancer-promoter interaction,the LightGBM algorithm is used to rank the features,and these features are mapped into protein data.Lots of proteins are found to play an important role in predicting enhancer-promoter interaction,Especially H3 K families,CTCF,etc.are essential for prediction and have been confirmed by related articles.(3)Based on topic 1,it was confirmed that CTCF is not only an insulator of enhancers,but CTCF is considered to have the ability to form chromatin loops.In order to confirm whether all CTCFs form chromatin loops,PC-Loop is proposed to predict CTCF-mediated chromatin loops.In this framework,we also try to use the feature extraction method in experiment 1 to predict chromatin loops,and Compared with the Lollipop method,the experimental results show that our proposed feature extraction method is universal for distinguishing enhancer-promoter interactions and chromatin loops.And proved that CTCF and cohesive protein complexes have more ability to form chromatin loops.
Keywords/Search Tags:Enhancer-promoter interactions, Lightgbm, gene expression, disease control, CTCF, chromatin loops
PDF Full Text Request
Related items