Font Size: a A A

On The Prediction Of DNA-binding Proteins Only From Primary Sequences:A Deep Learning Approach

Posted on:2019-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y H QuFull Text:PDF
GTID:2370330593451020Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The interaction between DNA-binding protein and DNA realizes many functions such as transcription,replication,selective scission and methylation,and thus plays an irreplaceable role in the regulation of organisms.The prediction of protein function based on amino acid sequence has gradually become a Important task.With the construction of various kinds of protein databases,more and more researchers began to dig useful information from massive biological data to explore the meaning of life.In recent years,a variety of statistics and machine learning methods have been proposed for predicting the function of DNA-binding proteins.These methods rely on feature sets constructed from protein structures and functional properties,and do not achieve satisfactory predictive results on large data sets.Since it is a difficult task to construct suitable features,this paper proposes a deep learning model that combines convolutional neural networks with long and short memory-dependent networks to predict DNA-binding proteins based on amino acid sequences.The model uses a two-layer convolutional neural network to search for sequence domains and retains the positional dependence of amino acids in the sequence through LSTM,and avoids tedious manual extraction by automatically learning features.This article describes several representative methods of extracting amino acid sequence features and combines them with traditional machine learning classification algorithms.According to the experimental results of the model on balanced datasets,unbalanced datasets and low-redundant datasets,it is proved that the deep learning model has obvious advantages in large-scale dataset prediction tasks.Compared with the traditional machine learning classification algorithm,Our model is a promising tool for identifying DNA-binding proteins.According to the experiments,the deep learning model combining CNN and LSTM has good reliability and generalization ability,and has a significant effect on the prediction of DNA binding protein based on the original amino acid sequence.Therefore,this model is a powerful DNA binding protein prediction tool,and has a wide range of applications in the field of biological information.
Keywords/Search Tags:DNA-Binding Protein, Convolutional Neural Network, Long-Short Term Neural Network, Deep Learning
PDF Full Text Request
Related items