Font Size: a A A

Research On Protein Fold Recognition Based On Multi-view Learning Algorithm

Posted on:2021-01-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:K YanFull Text:PDF
GTID:1480306569985229Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of high-throughput gene sequencing technology,people have accumulated a large amount of protein sequence data.However,the corresponding protein structure prediction and functional analysis are insufficient.It is one of the hot-spots in biological sequence analysis and proteomics to predict the corresponding protein structure and function.Related research on protein fold recognition has important significance for predicting protein structure and analysis function.In recent decades,researchers have proposed many machine learning algorithms in protein fold recognition.Many algorithms designed discriminative features and classifiers to obtain better recognition results.However,there are many problems in protein fold recognition.(1)Ensemble algorithms are constructed by fusing different attributes features.However,the comprehensive feaures display some redundant features or irrelevant features.(2)The similarity between sequences has not been introduced into the classification model.Multi-view features derived from different attributes of protein sequences and the corresponding classification methods are an effective solution to solve the above problems.This paper mainly proposes several multi-view learning algorithms to solve the above problems and improve the performance of the recognition algorithm.Specifically,the flowing classification methods are proposed in this dissertation:(1)To address the problem that the fusing features contain redundant information,we propose a multi-view learning algorithm based on the discriminative features.We utilize the sparse projection matrix based on L2,1norm to extract discriminative features for each view and construct the latent space based on the multiple views.The proposed method introduces a more discriminative objective function by enlarging the distance be-tween different folds.Experimental results on multiple protein fold benchmark datasets show that the algorithm can effectively utilize the different features and improve the classification performance.(2)To address the problem of false-positive results in the alignment algorithm,we propose a multi-view learning algorithm based on multiple sequence similarity features.The proposed method utilizes three alignment algorithms based on different attributes to calculate the similarity between sequences.The alignment scores are used to construct the multi-view features.We construct a latent subspace based on the complementary information from different views.In order to further improve the accuracy of the scoring of the alignment algorithm,we propose a multi-view learning algorithm based on the multiple pseudo-protein sequences'similarity features.The proposed method converts the original sequence into a pseudo-protein sequence which contains the evolutionary information.The experimental results on the benchmark dataset show the prosed methods can improve predictive performance.(3)To address the problem that cannot make full use of the similarity information between protein sequences in the classification model,we propose an adaptive weight graph embedding multi-view learning algorithm.The proposed method constructs a Laplacian matrix to characterize the similarity between protein sequences for each view.According to the complementarity and consistency between different perspectives,the proposed method introduces an adaptive weighted constraint to balance the importance of each view,which constructs a more compact low-dimensional representation.Experimental results show that the algorithm can use the similarity between each view sequence to improve the recognition accuracy.(4)To address the problem that the similarity feature constructed by the alignment algorithm contains noise information and the limitation of the single algorithm,we propose a multi-view learning algorithm based on low-rank constraints modeling and an decision fusing algorithm that combines a multi-view learning algorithm based on low-rank constraints modeling and an alignment algorithm,respectively.The proposed algorithm extracts the low-rank main features of each view to suppress the noise obtained by different comparison algorithms.The algorithm introduces a weighted combination constraint to construct the latent subspace.The algorithm constructs a discriminant regression function to improve the predictive performance.In addition,the algorithm proposes an effective strategy to fuse the prediction results of different methods.Experimental results on multiple protein fold databases show that the proposed method can effectively improve the prediction performance.In summary,some more robust and flexible multi-view learning algorithms are proposed to address the shortcoming of the traditional algorithms for protein fold recognition.In addition,we analyze the rationale of the proposed methods from the theoretical perspective.Experimental results on several benchmark datasets prove the effectiveness of the proposed algorithms in this paper.
Keywords/Search Tags:Protein fold recognition, multi-view learning, graph learning, low-rank representation, alignment method, protein sequence similarity scores
PDF Full Text Request
Related items