Font Size: a A A

The Research On Feature Extraction For The Prediction Of Amyloid Sequences Regions

Posted on:2021-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhouFull Text:PDF
GTID:2480306050472544Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Amyloid is a kind of insoluble fibrous proteins.Its misfolding is related to some diseases such as Alzheimer's disease,Parkinson's disease and so on.Besides,there is more evidence that many proteins can be converted into highly organized amyloid fibrils under given condition,whether in vivo or vitro.This stable and irreversible properties also make it a new nanostructured material.Therefore,the research on the formation of amyloid is essential and meaningful.The formation of amyloid is correlated with the aggregation of some regions of the sequences.This paper's proposal is to predict whether a sequence segment is a hot-spot region of aggregation.One of the most important links is feature extraction.Here,we focus on developing extraction methods based on protein sequences after digital expression and evolutionary information.The main contributions are summarized as follows:(1)We present a novel feature representation called Phy Ave_PSSMDwt.It includes two parts.One is based on five physicochemical properties of hydrophilicity,hydrophobicity,aggregation rate,expected packing density and H-bonding,from which we get 15 d features in total.The other is 60 d features through recursive feature selection from Position-Specific Scoring Matrix(PSSM)by Discrete Wavelet Transformation(DWT).The experimental result on Pep424 dataset shows that PSSM's information makes a great improvement to the predictive performance.And compared with other published algorithms,this method in cross-validation has a higher result by 3.0%,3.3%,0.026 and 0.055 in accuracy,specificity,Matthew's correlation coefficient and AUC value,respectively.It indicates the features representation with our prediction model is effective and competitive.(2)We propose a new feature extraction method called PN_AC,which contains Auto covariance features(AC),and also consider the positive and negative values of the sequence.According to the predicting results obtained by PN_AC and Phy Ave,we select 15 physicochemical properties from the AAindex1 database.And a property-matrix is constructed by them,then fusing the PSSM's evolutionary information.A new feature,combining PN_AC features of each column with Phy Ave,is employed to predict after Maximum-Relevance and Minimum-Redundancy feature selection.The experimental result shows that the PN_AC is effective for the prediction of aggregation regions,and its performance is better than Auto Covariance.In addition,the overall accuracy,sensitivity,Matthew's correlation coefficient and AUC value of Phy Ave_PSSMDwt are improved by 0.6%,2.6%,0.013,and 0.004 after combining the 15 properties information,which indicates that the selective richness of physicochemical properties is indeed of positive significance for improving the sensitivity.
Keywords/Search Tags:prediction of amyloid regions, feature extraction, support vector machine, physicochemical properties, Position-Specific Scoring Matrix, Discrete Wavelet Transform, Auto-Covariance Feature
PDF Full Text Request
Related items