| The various activities of biological cells are affected by the three-dimensional genome structure,with insulators playing an important role in organizing high-order structures.CTCF is a mammalian insulator that creates a barrier against the constant squeezing of the chromatin loop.As a multifunctional protein,CTCF has tens of thousands of binding sites in the genome,but only a part of them can serve as the anchor of the chromatin loop.It is unclear how cells select anchor sites during chromatin looping.This paper studies the prediction of CTCF loop anchors in the GM12878 cell line.First,the prediction performance of different feature extraction methods on the CTCF motif and flanking sequence was compared.It was found that sequential coding had fewer feature dimensions and higher performance,so the feature extraction method of sequential coding was selected.Then,the binding intensity of CTCF was used to predict the anchor of the CTCF loop,and the results of support vector machine were compared with other machine learning algorithms for performance prediction.Support vector machine(SVM)had the best accuracy among the other four evaluation indexes,with an accuracy of 87.34%.In this paper,a comparative analysis is performed to investigate the sequence preference and binding strength of anchor and non-anchor CTCF binding sites.It was found that the formation of loop anchors is mainly influenced by the CTCF binding intensity and binding pattern.In addition,the sites 12-14 and 45-47 in the flanking sequences significantly contributed to the classification prediction.Sites 45-47 correspond to zinc fingers 1-3 in CTCF,while sites 12-14 are located in the binding region of zinc fingers 8-11.On the GM12878 cell line,the flanking sequence can influence the formation of loop anchors by affecting the binding of zinc fingers.This work provides a reference for the prediction of CTCF-mediated chromatin loops.The signal value of histone modification and transcription factor interaction in and around loop anchors,as well as the difference in signal values in and around non-loop anchors,were then analyzed.According to the results,in the GM12878 cell line,the histones H2 AFZ,H3K4me1,H3K4me2,H3K9 ac,H3K27ac,and H3K79me2,and transcription factors MAZ,SMC3,YY1,ZNF143,and RAD21 show significant differences in signal values between loop and non-loop anchors.This indicates that in the GM12878 cell line,the formation of chromatin loops is closely related to the interaction of CTCF with the histones and transcription factors mentioned above.Moreover,previous studies have confirmed that these histones and transcription factors are involved in the biological process of chromatin loops formation,this work contributes to understanding the mechanism of loop anchor selection,further demonstrating the reliability of the anchor data of the CTCF loops constructed in this study. |