Font Size: a A A

Research And Implementation Of Data-Driven Promoter And Chromatin Loop Prediction Method

Posted on:2024-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:P Y ZhangFull Text:PDF
GTID:2530307121462664Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Gene expression and regulation are important molecular basis for biological processes such as cell differentiation and individual development.As important gene regulatory units in organisms,promoters and chromatin loops work together with numerous cis-gene regulatory elements to affect biological development and disease occurrence.Although various experimental techniques have been developed to detect promoters and chromatin loops,these techniques are usually time-consuming and expensive,thus facing many limitations in practical application.With the development of sequencing technology and artificial intelligence,the data-driven research pattern has become an effective way to solve this problem.Recently,bioinformaticians have proposed some data-driven promoter and chromatin loop prediction methods using data characteristics.However,the performance and generalization ability of these methods are limited by their single sequence feature encoding scheme and simple model architecture.In view of this,this study comprehensively studies different sequence feature encoding schemes,deeply mines the data characteristics of the DNA sequences,and constructs a data-driven promoter prediction model(i Pro-WAEL)and chromatin loop prediction model(CLNN-loop)to further improve prediction performance and generalization ability.In addition,this study also explores important transcription factor motifs in promoters and chromatin loops regions,providing novel ideas for future related research.Finally,this study develops a promoter and chromatin loop prediction and analysis platform to further promote research in related fields.The main research contents are as follows:(1)A promoter prediction algorithm based on ensemble learningAiming at the problem that the feature extraction method is relatively single,the model architecture is relatively simple,and the transcription factor motifs in the promoter region have not been explored in the existing research on promoter prediction algorithms,this study constructs a promoter prediction model i Pro-WAEL based on weighted average ensemble method by fusing multiple sequence-based features.Extensive benchmarking based on promoter datasets from seven species illustrate that i Pro-WAEL achieves optimal performance and is superior to the best performing methods in previous studies by1.6%-15.5% in average accuracy on the same dataset set.The experimental results also illustrate that i Pro-WAEL has satisfactory prediction performance in across-cell line prediction,etc.,demonstrating excellent generalization ability.Finally,a new computational approach is developed to identify important transcription factor motifs in promoter regions,and important motifs such as ZNF143 are identified and their biological roles are elucidated.The discovery of these motifs may provide novel ways for exploring the gene expression of organisms,mining the relationship between genes and diseases,promoting the study of disease pathogenesis,and advancing disease diagnosis and treatment.(2)A chromatin loop prediction algorithm based on deep learningAiming at the problem that the feature extraction method is relatively single,the model architecture is relatively simple,and the common features in the chromatin loop sequence pair have not been deeply explored in the existing research on chromatin loop prediction algorithm,this study constructs a chromatin loop prediction model CLNN-loop based on deep learning by fusing multiple sequence-based features.Extensive benchmarking based on chromatin loop datasets from two cell lines and four types illustrate that CLNN-loop achieves optimal performance and is superior to other classification algorithms in predicting chromatin loops.The experimental results also show that CLNN-loop has excellent generalization ability in cross-cell line and cross-type prediction,with an average AUC value 4.14% higher than the best performing method in previous studies.Finally,this study applies SHAP framework to explain the prediction results of multiple classification algorithms and identifies CTCF motif and sequence conservation as important markers of chromatin loops in different cell lines and types,providing a novel idea for future research on sequence-based chromatin loop prediction.(3)A promoter and chromatin loop prediction and analysis platformCombined with the application requirements of the promoter and chromatin loop in related studies,this study uses the Django backend framework and Lay UI frontend framework to develop a promoter and chromatin loop prediction and analysis platform to facilitate studies in related fields.Based on the feature extraction method of above two research contents,promoter prediction model(i Pro-WAEL),chromatin loop prediction model(CLNN-loop)and transcription factor motif database,this platform realizes four core functions: promoter prediction,promoter analysis,chromatin loop prediction and chromatin loop analysis,providing users with a simple,convenient,flexible and efficient interactive experience.
Keywords/Search Tags:Promoter, Chromatin loop, Deep learning, Ensemble learning, Motif analysis
PDF Full Text Request
Related items