Font Size: a A A

An Improved Prediction Method For Protein Secondary Structure Based On Support Vector Machine And Application On Vascular Protein Research

Posted on:2008-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2120360215977127Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
The development of proteomics related researches enabled the exposure of countless protein sequences. More in-depth studies, however, require that people explore more about 3D structures of proteins in order to know the relationship between structures and functions of proteins, the study of which is therefore one of the most popular fields of biological researches in post-genome era.By far, only about forty thousand protein structures have been worked out by biological experiments, which significantly fall behind the development of protein sequence. It is therefore indispensable to develop theoretical prediction methods by computer and mathematical approaches. Unfortunately, it is not a straightforward task to predict 3D structures from primary structures. Furthermore, 3D structures can be considerably learnt from secondary structures because of their inner structural discipline. The prediction of secondary structures is therefore a key step in predicting 3D structures from protein sequences.In this study, an improved prediction method based on a 24-dimension combined coding scheme and support vector machine was developed. In order to demonstrate the superiority of our method, we, by extracting from the PDB SELECT and HSSP database, developed three independent datasets consisting of 141, 212, and 365 non-homologous protein chains, respectively. The 7-fold cross-validation on the 365-protein dataset showed a Q3 of 78.96 %, which was almost 3 % better than that using the classical 22-dimension matrix. Furthermore, given the results obtained from the three datasets with an increasing order of size, we may infer that the superiority of Q3 generated by our method would be expected to improve as the dataset is further expanded.Moreover, we further discussed the advantage and application prospect in vascular proteomics of our method by evaluating the prediction performances of different methods on three hypertension-related proteins.
Keywords/Search Tags:support vector machine, secondary structure prediction, dataset, cross-validation, proteomics
PDF Full Text Request
Related items