Research On Protein Fold Recognition Based On Multi-view Learning Algorithm

Posted on:2021-01-05

Degree:Doctor

Type:Dissertation

Country:China

Candidate:K Yan

Full Text:PDF

GTID:1480306569985229

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of high-throughput gene sequencing technology,people have accumulated a large amount of protein sequence data.However,the corresponding protein structure prediction and functional analysis are insufficient.It is one of the hot-spots in biological sequence analysis and proteomics to predict the corresponding protein structure and function.Related research on protein fold recognition has important significance for predicting protein structure and analysis function.In recent decades,researchers have proposed many machine learning algorithms in protein fold recognition.Many algorithms designed discriminative features and classifiers to obtain better recognition results.However,there are many problems in protein fold recognition.（1）Ensemble algorithms are constructed by fusing different attributes features.However,the comprehensive feaures display some redundant features or irrelevant features.（2）The similarity between sequences has not been introduced into the classification model.Multi-view features derived from different attributes of protein sequences and the corresponding classification methods are an effective solution to solve the above problems.This paper mainly proposes several multi-view learning algorithms to solve the above problems and improve the performance of the recognition algorithm.Specifically,the flowing classification methods are proposed in this dissertation:（1）To address the problem that the fusing features contain redundant information,we propose a multi-view learning algorithm based on the discriminative features.We utilize the sparse projection matrix based on L_2,1norm to extract discriminative features for each view and construct the latent space based on the multiple views.The proposed method introduces a more discriminative objective function by enlarging the distance be-tween different folds.Experimental results on multiple protein fold benchmark datasets show that the algorithm can effectively utilize the different features and improve the classification performance.（2）To address the problem of false-positive results in the alignment algorithm,we propose a multi-view learning algorithm based on multiple sequence similarity features.The proposed method utilizes three alignment algorithms based on different attributes to calculate the similarity between sequences.The alignment scores are used to construct the multi-view features.We construct a latent subspace based on the complementary information from different views.In order to further improve the accuracy of the scoring of the alignment algorithm,we propose a multi-view learning algorithm based on the multiple pseudo-protein sequences’similarity features.The proposed method converts the original sequence into a pseudo-protein sequence which contains the evolutionary information.The experimental results on the benchmark dataset show the prosed methods can improve predictive performance.（3）To address the problem that cannot make full use of the similarity information between protein sequences in the classification model,we propose an adaptive weight graph embedding multi-view learning algorithm.The proposed method constructs a Laplacian matrix to characterize the similarity between protein sequences for each view.According to the complementarity and consistency between different perspectives,the proposed method introduces an adaptive weighted constraint to balance the importance of each view,which constructs a more compact low-dimensional representation.Experimental results show that the algorithm can use the similarity between each view sequence to improve the recognition accuracy.（4）To address the problem that the similarity feature constructed by the alignment algorithm contains noise information and the limitation of the single algorithm,we propose a multi-view learning algorithm based on low-rank constraints modeling and an decision fusing algorithm that combines a multi-view learning algorithm based on low-rank constraints modeling and an alignment algorithm,respectively.The proposed algorithm extracts the low-rank main features of each view to suppress the noise obtained by different comparison algorithms.The algorithm introduces a weighted combination constraint to construct the latent subspace.The algorithm constructs a discriminant regression function to improve the predictive performance.In addition,the algorithm proposes an effective strategy to fuse the prediction results of different methods.Experimental results on multiple protein fold databases show that the proposed method can effectively improve the prediction performance.In summary,some more robust and flexible multi-view learning algorithms are proposed to address the shortcoming of the traditional algorithms for protein fold recognition.In addition,we analyze the rationale of the proposed methods from the theoretical perspective.Experimental results on several benchmark datasets prove the effectiveness of the proposed algorithms in this paper.

Keywords/Search Tags:

Protein fold recognition, multi-view learning, graph learning, low-rank representation, alignment method, protein sequence similarity scores

PDF Full Text Request

Related items

1	Protein Remote Homology Detection And Folding Recognition Based On SCOP Topology
2	Research On Graph Embedded Clustering Models
3	Multi-view Learning In DNA-binding Protein Identification
4	Protein Fold Recognition Based On Amino Acid Sequence
5	Research On Network Alignment Based On Local Structure Feature Enhancement
6	Research And Application Of Multi-task-oriented Network Representation Learning Method
7	Research On Models And Algorithms Of Graph Neural Networks
8	A Research On Deep Multi-View Clustering Model Based On Graph Representation Learning
9	Multi-view Joint Learning Method For Region Representation Based On Graph Structure
10	Research On Link Prediction Methods Based On Graph Autoencoder