Prediction Of Protein Secondary Structure Based On PSSM Via Semi-supervised Learning

Posted on:2021-09-01

Degree:Master

Type:Thesis

Country:China

Candidate:S D Zou

Full Text:PDF

GTID:2480306515994659

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Understanding the structure and function of proteins is of great practical significance in the fields of life-science,agriculture and medical treatment.Prediction of protein secondary structure is an important part of protein structure research.Using machine learning to predict protein secondary structure with PSSM is an important method in bioinformatics field.In order to reduce the dependence on high-quality annotated data,it is of important informatics and biological significance to explore the semi-supervised learning to mine the protein sequence data and realize the prediction of the secondary structure.The core of this algorithm is to fully mine the useful information of unlabeled data and form efficient fusion with labeled data.At the same time,the design of appropriate feature engineering method will effectively improve the recognition performance of protein secondary structure.This paper carries out the following work on this issue:(1)An effective feature representation method was designed for the PSSM.In other words,considering the evolution information of different protein amino acids and the information between adjacent residues and non-adjacent residues,a variety of feature representation methods based on PSSM are designed to map(or transform)to generate numerical feature vectors with strong discriminant ability.(2)In view of the high-dimensional feature vectors generated by feature representation,the role of feature selection in structure prediction is examined.That is to say,the filtering method based on statistical information eliminates the redundant and irrelevant features generated in the feature representation process,and the experiment compares their effects on semi-supervised learning performance.(3)To introducing a variety of semi-supervised learning algorithms,ladder network is put forward,which is a model of the integration of supervised and unsupervised characteristics,based on the noise reduction mechanism on structures,the encoder and decoder communication bridge,so as to realize the model of a semi-supervised learning.(4)By experiment and comparison,the grouping design of D8244 and D640 data sets and three different standard ratios proves that the accuracy of the ladder network semi-supervised model is better than that of other classical semi-supervised models under the same external conditions.In addition,the parameter and feature representation of the optimized combined model were optimized.Compared with traditional SVM and RF,the performance of the obtained model was comparable to that of the supervised algorithm.The semi-supervised learning algorithm based on ladder network has some practicability in protein secondary structure recognition,and the preliminary feature engineering can improve the performance of the model.Therefore,the method proposed in this paper can be applied to the prediction of secondary structure of proteins,and the research method also has informatics and biological significance for the combination of data mining methods and cutting-edge problems in biological science.

Keywords/Search Tags:

Semi-Supervised learning, position specific scoring matrix, feature engineering, ladder network, recognizing accuracy

PDF Full Text Request

Related items

1	Study On Semi-supervised Generative Adversarial Network Models For Predicting Protein Secondary Structures
2	Research On SSVEP Classification Algorithm Based On Semi-supervised Ladder Network
3	Research On Presentation Learning Methods Of Semi-supervised Network Based On Deep Learning
4	The Dynamic Method Of Transcription Factor Binding Sites Recognition Based On Genetic Algorithm And Position Specific Scoring Matrix
5	The Research On Feature Extraction For The Prediction Of Amyloid Sequences Regions
6	The Classification Of Quantum Correlations Based On Semi-Supervised Machine Learning
7	Research On Semi-supervised Community Detection Based On Constraint Matrix And Linear Representation
8	Research On Protein Subcellular Localization Based On Feature Extraction
9	Pattern Analysis And Recognition Of Image-based Protein Subcellular Location
10	Prediction Of Bacterial Type Ⅳ Secreted Effectors And Phage Virion Proteins By Integrating Sequence And Evolutionary Information