Font Size: a A A

A Deep Learning Model Based On Self-Attention Mechanism For Identification And Functional Annotation Of DNA-Binding Proteins

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YangFull Text:PDF
GTID:2480306548481884Subject:Computer technology
Abstract/Summary:PDF Full Text Request
DNA binding proteins play a vital role in cell life activities.The identification and functional annotation of DNA binding proteins based on protein sequence information is one of the main challenges in bioinformatics research.Traditional machine learning methods can only predict small-scale data.Although prediction methods based on deep learning have achieved significant prediction results on large data sets,they cannot find the functional domains of DNA binding proteins.In this thesis,a deep learning model based on Self-attention mechanism is proposed to complete the identification and functional annotation of DNA binding proteins.After a protein sequence passes through the coding layer and the embedding layer,it enters two stages,the first stage is composed of a long-short memory neural network layer and a Self-attention layer,and the second stage is composed of a convolutional neural network layer and a Self-attention layer.Then,the weighted vectors output from the Self-attention layer in the two stages are connected into a feature vector,input to the fully connected layer,and classified using the Sigmoid function to complete the prediction of DNA binding protein.In the first stage,the Self-attention mechanism can be used to obtain the weight information of each amino acid position.By analyzing the motif of consecutive amino acid small fragments with higher weights,the annotation of the functional domain of DNA binding proteins can be completed.Finally,the prediction accuracy of the method on different scales and different types of data sets is above 0.915,and three motifs that highly match the DNA binding site can be found.In these three motif-located protein regions,the proportion of DNA binding protein sites are 80%,67%,55%,respectively.The biological significance of these three motifs are Domain: Homeobox,Feature key: Zinc finger C2H2-type,Domain: MADS-box.
Keywords/Search Tags:DNA binding Proteins, Functional Annotation, Self-attention Mechanism, Convolutional Neural Network, Long-Short Term Neural Network
PDF Full Text Request
Related items