Font Size: a A A

Research On Promoter Recognition And Classification Based On Deep Learning Framework

Posted on:2022-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:A SunFull Text:PDF
GTID:2480306317468514Subject:Statistics
Abstract/Summary:PDF Full Text Request
The promoter is a DNA sequence that contains the transcription start site and has a basic regulatory effect.It is responsible for initiating the transcription of a specific gene in the genome,and control the initial transcription and expression intensity of the gene.Precise identification of the promoter is great significant to understand transcriptional regulation.Using machine learning and other methods that based on the calculation to identify promoters can save costs and time more than traditional biochemical experimental methods.In recent years,deep learning has become more and more popular in bioinformatics field,especially in the promoter recognition.This article research focus on the application of deep learning in bioinformatics,and mainly divided into two aspects:eukaryotic promoter and prokaryotic promoter.In the study of eukaryotic promoters,the TATA-box promoter plays an important role in the process of gene transcription.In order to quickly and accurately identify TATA-box promoters in eukaryotes,this paper focuses on deep learning ideas and uses convolutional neural network(CNN)methods to design and build a two-layer classifier i PTT(2L)-CNN.The first layer is used to identify whether the DNA sequence is a promoter,and the second layer is used to identify whether the identified promoter belongs to the TATA-box type or the TATA-less type.In addition,this article also provides related researchers with an online recognition service of i PTT(2L)-CNN: http://www.jci-bioinfo.cn/i PTT(2L)-CNN,using the 5-fold cross-validation method,The prediction accuracy rates of the first and second layers of the predictor are 91.97% and 94.70%,respectively,which can effectively identify eukaryotic promoters and their types.In the study of prokaryotic promoters,this paper uses multi-feature fusion to encode DNA sequences and combines the XGBoost classification algorithm of the ensemble learning to construct a two-layer predictor i PSI(2L)-XGBoost.The first layer of the predictor is used to identify whether the DNA sequence is a prokaryotic promoter,and the second layer is used to identify whether the promoter belongs to the strong promoter type or the weak promoter type.The classification method based on the strength of the promoter has been proposed in recent years,and it is also a hot issue in the study of prokaryotic promoters.In this paper,a feature coding method based on principal component analysis is proposed,which is combined with feature codes extracted by convolutional neural networks for the study of prokaryotic promoters.The predictor to distinguish prokaryotic promoter was proposed in this paper,and it is superior to the existing predictors in performance,and the prediction accuracy rates of its two layers are 94.13% and 85.36%,respectively.Therefore,the predictor i PSI(2L)-XGBoost is an effective tool for identifying prokaryotic promoters and their types.The results of research can be helpful in the field of promoter identification and classification,and are great significant to disease research,drug development,and bioengineering.
Keywords/Search Tags:Promoter prediction, Promoter classification, Deep learning, Convolutional neural network
PDF Full Text Request
Related items