Font Size: a A A

Network Regularized Optimization Modeling Of DNA Methyltransferase Binding Prediction

Posted on:2021-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:L X RenFull Text:PDF
GTID:2370330620976543Subject:Mathematics
Abstract/Summary:PDF Full Text Request
High-dimensional optimization problems in big data modeling usually introduce regularization to restrict the complexity of the model,improve the interpretability of the model,and improve or reduce overfitting to submit prediction accuracy.For example,sparsity regularization is generated in the field of signal processing,and classic models such as lasso and compressed sensing have been developed.In this paper,we purpose an optimization model for biomedical data by introducing the network structure of the relationship between variables into the regularization.Specifically,we constructed a network regularized optimization model to predict DNA methlytransferase(DNMT)binding by integrating multi-omics data such as transcriptome,epigenome,and protein interaction.DNA methylation mediated by DNMT plays an important role in embryonic development and tumorigenesis.However,the DNMT proteins ' binding data is still missing in most tissues and cell lines.Progress in our understanding of this important protein family's functional mechanism have been hindered by this limitation.It is necessary to develop computational biology methods to predict DNMT bindinginformation by integrating a large amount of multi-omics data.In this thesis,we develop an optimization model by integrating multi-omics data to predict the binding site of DNMT on the genome.The main work includes:(1)We propose an adaptive lasso-regularized logistic regression model,GuidingNet,for predicting DNMT'genome-wide binding by integrating gene expression,chromatin accessibility,sequence,and protein-protein interaction data.The main contribution is to reconstruct the regulatory network based on the protein interaction network and cross-tissue expression data and proposes a regularized optimization model based on network topology to select adaptive lasso weights.Unlike traditional weight selection methods,GuidingNet considers the biological knowledge of the interaction between transcription factors,which greatly enhances the biological interpretation of the model.(2)The output of GuidingNet includes a TF network for binding prediction.The structure of this TF network can help us to interpret the mechanism of DNMT binding in different tissues and cell types.(3)We tested GuidingNet on several DNMTs for several cell lines in both human and mouse.It shows great performance in both prediction accuracy and feature selection.GuidingNet can also apply across tissue contexts DNMT binding prediction.(4)We generalize GuidingNet to predict other CR binding sites in human and mice.Our method also achieves good performance in both prediction accuracy and feature selection.Guiding can also select biologically important transcription factors.In summary,this thesis proposes GuidingNet,an optimization model based on network regularization.As a general model framework,GuidingNet can be applied to the prediction of other CR binding in different cellular contexts and further understanding of the binding mechanism.
Keywords/Search Tags:logistic regression, regularization, DNA methyltransferase, data integration, feature selection
PDF Full Text Request
Related items