Font Size: a A A

Study On Explainable Deep Learning Model For Predicting DNA-Binding Protein

Posted on:2019-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2370330626952104Subject:Computer technology
Abstract/Summary:PDF Full Text Request
DNA-binding proteins(DBP)play their key roles through the interactions with DNA in cellular activities.Currently,deep-learning networks are used to make highly accurate predictions for multiple DNA-binding protein primary sequences.However,due to the reasons of more data transmission and different processing methods of different functional layers in deep-learning models,it is difficult to explain the training models and results.We design the following experiments to solve these problems:(1)Simplified feature engineering,sequence processing modules and classifier.We use a one-hot coding method to process the primary sequence of DNA-binding protein,and use the one-dimensional convolutional layer of multiple convolution kernels for the first layer of processing.Then we transpose the result which will be convoluted with one kernel with size of one.We use the logistic regression to classify the results after sequence processing.Because the sequence processing and classification process are completely linear processes,we can directly integrate the front and back operations to obtain the weighted summation convolution kernel in the convolution process.The cross-validation accuracy of this model can reach 80.5%~86.6%.We used this model to find the similarity of partial amino acid expression in DNA-binding proteins.(2)Using explainable subnetworks.We replace the logistic regression with explainable subnetworks,which is a process that uses a multi-layer perceptron to approximate an arbitrary rational function with a certain precision,and introduces a sub-network for each feature to learn the nonlinear contribution of each feature.Using an explainable subnetwork,we obtained some convolution kernels and their nonlinear contributions,and considered that the nonlinear model can converge more quickly and accurately.(3)Generation model based on convolution discriminant.We design a scoringincreasing sequence generation algorithm to generate sequences that can be identified by existing classification models.Then we try to evaluate the generated DNA-binding protein primary structure sequences.
Keywords/Search Tags:Protein primary structure, Deep learning, Convolutional neural network, Explainable Subnetwork, Sequence generation
PDF Full Text Request
Related items