Font Size: a A A

Transcription Factor Binding Site Prediction Based On DNA Sequences

Posted on:2022-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:L C ShenFull Text:PDF
GTID:2510306752997589Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Knowledge of the specificity of DNA-protein binding is crucial for understanding the mechanisms of gene expression,regulation and gene therapy,and transcription factor is a common DNA binding protein.Transcription factors,by binding to DNA,regulate the expression of genes located downstream,enhance or inhibit the activity of downstream genes,and also play a vital role in regulating protein translation.Therefore,it is an important task to accurately identify transcription factor binding sites from DNA sequences.Methods based on molecular biology experiments to identify transcription factor binding sites have disadvantages such as time-consuming and high cost.In recent years,with the emergence of high-throughput sequencing technology,the improvement of computer performance and the development of algorithms,it is possible to build the prediction model of transcription factor binding sites to mine valuable knowledge from massive biological data.However,the prediction accuracy of existing transcription factor binding site prediction models based on machine learning and deep learning still needs to be improved.Therefore,it is necessary to further improve the performance of the transcription factor binding site prediction model.In this paper,we have conducted in-depth research on the prediction of transcription factor binding sites based on DNA sequences.The main work is as follows:(1)In this paper,a transcription factor binding site prediction model named SARes Net is proposed by combining the self-attention mechanism with the residual network.The selfattention mechanism can well capture the remote dependence of sequence,integrate the spatial information into the network,and complement each other with the local information obtained by convolution,so that the network can effectively learn the spatial position information and local information.In addition,SARes Net adopts migration learning,which improves the generalization performance of the network and speeds up the convergence speed of the model fine-tuning stage.The experimental results show that the SARes Net model performs well on the benchmark data set of transcription factor binding site prediction.Compared with the best state-of-the-art method currently available,it has a further improvement,and has good prediction performance and generalization ability.(2)In this paper,considering that LSTM network has the powerful characteristics of processing sequence data and can deal with long-term dependence problems,a binding site prediction model named LSTM-Net is proposed based on LSTM network.Experimental verification shows that LSTM-Net has good prediction performance.(3)Considering that both SARes Net and LSTM-Net have similar predictive performance,this paper also proposes a transcription factor binding site prediction model called SARes NetLSTM,which combines self-attention residual network with LSTM network.By comparing the prediction performance of LSTM-NET,SARes Net and SARes Net-LSTM with data sets of different scales and different cell lines,and further comparing with other existing prediction models on benchmark data sets,it shows the accuracy and stability of the three models proposed in this paper.(4)In order to facilitate the use of biomedical researchers,we have developed a prediction platform based on the Spring Cloud microservice framework to provide online binding site prediction services.
Keywords/Search Tags:Transcription factor binding site, self-attention mechanism, deep residual network, transfer learning, long short-term memory network, online prediction service, sequence analysis
PDF Full Text Request
Related items