Font Size: a A A

Protein Fold Recognition Based On Fold-specific Features

Posted on:2021-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:C C LiFull Text:PDF
GTID:2370330611499755Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technologies,the determination of the structure and function of proteins cannot keep pace with the development of sequencing technologies.Faced with massive protein sequence data,how to find efficient calculation methods to predict and analyze the structure and function of unknown proteins at low cost has become one of the most difficult points in the field of biological sequence analysis.Protein fold recognition is one of the key tasks in studying protein structure and function.Through the protein fold recognition study,the target protein and template protein with low sequence similarity(less than 25%)can be matched and classified into the same protein fold type with similar structure and function,so that the structure and function of the target protein can be preliminarily inferred according to the structure and function of the template protein.The core of performance improvement based on machine learning methods is the construction of discriminative feature vectors and classifiers.Designing feature extraction methods with strong discriminative power is still the bottleneck of performance improvement of machine learning methods.This paper introduces deep learning techniques to extract more discriminative fold-specific features from the original protein sequence data,and at the same time makes the deep learning process more transparent and explained,to solve the problem of protein fold recognition.The main research contents include the following aspects:Due to the lack of strong discriminative fold-specific features in the field,we first propose the extraction methods based on deep neural network for fold-specific features.Two feature extraction methods based on deep learning networks: CNN-BLSTM and DCNN-BLSTM are designed to extract discriminative fold-specific features.By using convolutional neural networks to extract local features that take into account protein structure and evolutionary information,the dependence information of local features can be captured by a bidirectional LSTM network,and finally the fold-specific features are obtained.The feature analysis results on the benchmark dataset LE indicate that the features extracted by deep neural network are characterized by fold specificity and strong discriminative power.Furthermore,for the lack of interpretability and biological characteristics of fold-specific features extracted by deep neural network,the improved folding-specific feature extraction method are proposed: feature extraction methods based on protein structure motifs for convolutional neural networks(Motif CNN and Motif DCNN).Fold-specific features are extracted by introducing structural motifs into the convolutional neural network,and more fold-specific discriminative features for protein fold recognition are explored from biological properties.On the benchmark dataset LE,this paper analyzes the feature of the fold-specific feature extracted by Motif DCNN based on the structural information CCM.The feature analysis shows that the fold-specific features extracted by the Motif DCNN model are more powerful and discriminating than DCNN-BLSTM model.At the same time,this paper further explores the biological characteristics of the protein fold-specific features extracted by Motif DCNN,which can better capture the structural information of the protein.On this basis,two protein fold recognition predictors(Deep SVM-fold and Motif CNN-fold)are proposed in order to make full use of the fold-specific features extracted based on deep learning technology and to comprehensively fuse the fold-specific features containing evolutionary information and structural information,Which are respectively based on the pairwise sequence similarity scores of fold-specific features based on traditional deep neural network and motif-based convolution neural network,combined with SVM.The experimental results in the benchmark dataset LE show that Deep SVM-fold has achieved excellent results,indicating that the pairwise sequence similarity scores from evolutionary information and structural information are effective for protein fold recognition.In addition,the fold recognition accuracy of Motif CNN-fold is 5.25% higher than that of Deep SVM-fold,indicating that the feature extraction methods based on motif-based CNN is more efficient than the feature extraction method based on deep neural network.The feature extraction method of motif-based CNN helps to improve the performance of fold recognition by introducing biological properties.In summary,we focused on protein fold recognition.Two kinds of fold-specific feature extraction methods based on deep learning are proposed.And the performance of fold recognition is further improved by the pairwise similarity scoring fusion strategy of evolutionary information and structural information.Finally,through feature analysis and experimental results on the benchmark dataset,it proves that the combination of biological characteristics and deep learning technology makes the application of deep learning technology in bioinformatics more transparent and biologically significant.
Keywords/Search Tags:protein fold recognition, structural motif kernel, deep neural network, motif-based convolutional neural networks, support vector machine
PDF Full Text Request
Related items