Font Size: a A A

Study On The Novel Methods For The Prediction Of Protein Families

Posted on:2011-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:J H HuangFull Text:PDF
GTID:2120360308973919Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
With the approach of post-genome era, proteomics is becoming an important research domain in the life science. With the rapid development of modern biological science and technology, protein sequence data are emerging at an explosive pace. Analysis of these data to extract the useful information is the hot topic in modern bioinformatics. Although the structures and functions of all these proteins can be determined by experimental methods, they are time-consuming and expensive. Accordingly, it is highly desirable to develop automated and reliable predictive methods from the primary protein sequence. According to the research actuality of bioinformatics, a new method that couples discrete wavelet transform (DWT) with support vector machine (SVM) was proposed to predict protein structure and function merely based on the information of protein primary sequence, including the physicochemical properties of its comprised amino acids. The main contents are listed as follows:(1)A novel predictor is developed for identification and predicting of G-protein coupled receptors (GPCRs) by coupling DWT with SVM. The method includes three steps, in the first step, the protein sequences were transformed into numerical signals by the amino acid physicochemical properties, and then the discrete wavelet transform (DWT) was employed to extract frequency-band features; finally, the support vector machine algorithm was used to model with these feature vectors. The cross-validation results demonstrate that GPCRs could be correctly identified with an accuracy of 99.72%,97.64%, and 99.20% at family level, subfamily level, and sub-subfamily level, respectively. The prediction performances were all better than previous methods. In compared with most recent prediction methods, the method in this paper showed a significant increase in prediction performance. All the results indicate that the method in this paper is powerful for GPCRs prediction.(2) A novel method for the prediction of enzyme family classes is developed by coupling DWT with SVM. The enzyme proteins can be classified into six family classes, and the oxidoreductase contains 16 subfamilies, the one-verse-one and one-verse-others training strategy was adopted to decompose multi-class into a series of binary SVMs to solve multi-classes problem, respectively. Besides, appropriate dilations, wavelet functions, various amino acid physicochemical properties and kernel functions were discussed in detail. The jackknife test was performed on the dataset C1200 and C2640. The overall accuracies thus obtained were 91.9% and 99.17%, which were much higher than other methods. Compared with more recent prediction methods that are in general more complex and require model assumptions, our method manipulates simple, visual and performs reasonably well.(3) Prediction of lowly homological protein secondary structure is still a difficult problem up to now. Based on the amino acid physical and chemical properties, a promising predictive method has been proposed to determine the protein secondary structure. As a showcase, four standard datasets including C204, C359, W1189 and 25PDB were used to access the performance of the current method. It implies that based on multiple features can make better advantage of the sequence information of a protein than individual feature. The current approach may serve as a powerful complementary tool to other existing methods in this area.All the above techniques have complete processing programs. They can be used and spread easily.This study was supported by the National Natural Science Foundation of China and Natural Foundation of Jiangxi Province.
Keywords/Search Tags:Discrete wavelet transform, support vector machines, protein family, physicochemical properties, classification
PDF Full Text Request
Related items