Font Size: a A A

Prediction Of Transcription Factor Binding Sites Based On Deep Learning

Posted on:2024-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:K X FengFull Text:PDF
GTID:2530307295459544Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Transcription factors(TFs),which activate or inhibit gene transcription by binding to specific non-coding regions of DNA,play an indispensable role in gene expression.These specific regions bound by TFs are called transcription factor binding sites(TFBS).The rapid development of high-throughput sequencing technology has brought a large amount of TF-DNA binding data for TFBS-related tasks.These data provide us with an unprecedented opportunity to develop computational methods for predicting TFBS and motifs.In recent years,many computational tools have been developed to study the binding mechanism of TF-DNA.However,a thorough understanding of it remains fragmented.In this thesis,we developed integrated computational methods at sequence level and base resolution respectively to systematically reveal human and mouse TF-DNA binding mechanisms.(1)We developed Mul TFBS,a multi-channel deep learning framework for predicting transcription factor binding sites at DNA sequence level.Mul TFBS integrated One-hot coding,Three-dimensional shape feature of nucleotides,and Word2 vec feature considering similarity between nucleotides.The three types of features were transferred to three channels of the network respectively for feature extraction.All of the three channels were deep learning networks based on CNN and attention mechanism.Then we spliced the output of the three channels at the end of the feature integration network and sent to the fully connected layers.We predicted the probe strength at the sequence level on 66 mouse TFs in vitro dataset of PBM,and determined the binding or non-binding sequences on 5 mouse TFs in vivo datasets of Ch IP-seq.The results showed that Mul TFBS was highly competitive in the task of predicting the probe strength and determining TFBSs.(2)Few previous studies have predicted TFBSs at base resolution.Therefore,we developed an integrated context-aware neural framework(GNet)for predicting transcription factor binding signals with single nucleotide resolution.Specifically,we have developed a deep learning framework that includes coding-decoding blocks,infiltrates the idea of highway and gating into the entire network,and integrates the improved dual attention mechanism(DEA)to automatically learn potential features,The three features of One-hot coding,Ring-function-hydrogen-chemical property(RFHC)and Nucleotide density(ND)were used to complete the three tasks of identification of binding or non-binding regions on the sequence level,prediction of single nucleotide resolution signal and recognition of TF-DNA binding motif.Results on 53 human TFs Che-Seq data sets and 6 chromatin accessibility ATAC-seq data sets,as well as cross-species studies in humans and mice,show that our framework is superior to existing algorithms in both base resolution and sequence level.
Keywords/Search Tags:TFBSs prediction, Deep learning, Attention mechanism, Highway gated mechanism, Motif recognition
PDF Full Text Request
Related items