Font Size: a A A

Identification Of DNA-binding Proteins Based On Sequence Information

Posted on:2022-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhouFull Text:PDF
GTID:2480306527983039Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the successful implementation of Human Genome Project and Molecular Sequencing Project,many proteins in human life activities have unveiled their mysteries.The famous biological resource databases,such as PDB,Swiss-Prot,SCOP,etc.whose protein sequences tend to exponential increases every year.Therefore,it is an urgent demand for efficient methods to process mass data.DNA-binding proteins(DBPs)are indispensable proteins in living organisms,which not only play a vital role in DNA transcription,replication,recombination and repair,but also relate to many diseases.Traditional laboratory methods can accurately explore the structure of DNA-binding proteins and specific patterns of DNA-protein interaction,but it has strict requirements on experimental equipment and environment,and consumes much time and money.With the rapid development of Big Data and Artificial Intelligence technology,the design of computational methods to predict large-scale DNA-binding proteins precisely has become a research hotspot in bioinformatics.Based on the sequence information of proteins,this dissertation develops three efficient DNA-binding protein predictors using machine learning and deep learning algorithms.The main research work of this dissertation is as follows:(1)The dissertation proposes a novel DNA-binding protein predictor based on multi-view features and feature selection,i DBP-DEP.Firstly,multi-view feature sources including protein evolutionary profile,dipeptide composition and physicochemical properties were fused and selected to judge whether the protein can interact with DNA effectively.Secondly,a new sequence-based feature,PSSM-DBT,was proposed to solve the shortage of protein coding methods.This feature creatively combined distance bigram transformation with position specific scoring matrix(PSSM),which enhanced the identification ability of i DBP-DEP model.Finally,PSSM-DBT was further analyzed in order to explore the biological principle behind its excellent performance.The results of Jackknife test and independent test on three datasets verify the good detection performance of i DBP-DEP.(2)In this dissertation,we develop a DNA-binding protein predictor combining HMM,physicochemical properties of amino acids and secondary structure of proteins.Firstly,instead of the traditional evolutionary profile generated by PSI-BLAST,Hidden Markov Model matrix was used to extract amino acid features to efficiently express the evolutionary information of proteins.Secondly,C-T-D was used to encode the physicochemical properties and secondary structure of amino acids to characterize the global sequence composition of proteins.Finally,the optimal feature subsets were selected applying a feature selection algorithm,which input into SVM and Light GBM classifiers to determine whether the protein can interact with DNA.The experimental results confirmed the practicability and effectiveness of the proposed method,and demonstrate the outstanding characterization ability of the HMM as a new evolutionary profile.(3)Based on the deep learning model,this article proposes a DNA-binding protein Identification method that integrates multi-layer convolution and Bidirectional GRU network.Firstly,enough positive and negative samples were extracted and processed from Swiss Prot database as data sets to obtain sufficient training data.Secondly,amphiphilic pseudo amino acid composition features were applied as network input to fully represent the amino acid composition and physicochemical properties of the protein.Then,Softpool layer was used to replace traditional pooling layers to retain local protein characteristic information.Finally,in the end-to-end offline training,the convolution module and bidirectional GRU module of the proposed model can learn the modeling ability of protein depth characteristics and the recognition ability of DNA-binding proteins.Experimental results demonstrate the effectiveness of the proposed method.
Keywords/Search Tags:DNA-binding proteins, Feature selection, Machine learning, Deep learning
PDF Full Text Request
Related items