| With the successful completion of Human Genome Project (HGP) and the rapid development of modern biological science and technology, protein sequence data are emerging at an explosive pace. Analysis of these data to extract the useful information is the hot topic in modern bioinformatics. The classical experimental methods for structure analysis of proteins are X-ray crystallography and multi-dimensional nuclear magnetic resonance NMR), which are expensive and time-consuming. To some proteins, it is impossible to obtain their 3D structures for the experimental limitations and difficulties, which accordingly hinder the functional understanding of the proteins. On the other hand, the sequencing of proteins is relatively fast, simple, and inexpensive. As a result, the gap between the number of known protein sequences and the number of known three-dimensional protein structures becomes more and more large. Accordingly, it is highly desirable to develop automated and reliable predictive methods from the primary protein sequence. According to the research actuality of protein structure and function, a new method that couples discrete wavelet transform (DWT) with support vector machine (SVM) was proposed to predict protein structure and function merely based on the information of protein primary sequence, including the physical and chemical properties pf its comprised amino acids. The main contents are listed as follows:1. In this paper, a new method based on discrete wavelet transform and support vector machine is developed for predicting the protein structural class. It is featured by employing a support vector machine learning system and using a novel representation, which was introduced to, to some extent, take into account the sequence -order effects to represent protein samples. As a showcase, the jackknife test was performed on a working dataset that contains 204 non-homologous proteins. The predicted results are very encouraging. It also suggested by further experiments that sequence homology has a significant impact on prediction accuracy. The current approach may serves as a powerful complementary tool to other existing methods in this area.2. In this paper, a new method based on the combining of the discrete wavelet transform (DWT) with support vector machine is introduced to predict the enzyme structure. The proteins of 1A2J are chosen from the protein databank as examples to describe the prediction of the enzyme structure by using this method. Selection of an appropriate dilation, wavelet function and hydrophobicity data types are discussed in detail. As a demonstration, the predictive performance of current method was evaluated on two datasets, involving the standard dataset PA1178 generated by Paula and Andrew, and the dataset C1200 generated by Cai et al. With the jackknife test, the overall accuracies of current on the two dataset reach 95.59% and 93.75%, respectively. Compared with more recent prediction methods that are in general more complex and require model assumptions, our non-parametric method manipulates simple, visual and performs reasonably well.3. Based on the amino acid hydrophobicity, a promising predictive method has been proposed to determine the subcellular locations of apoptosis proteins. The method includes three steps, in the first, the protein sequences are transformed into numerical signals by the amino acid hydrophobic values, and then the discrete wavelet transform (DWT) was employed to extract salient frequency-band features; finally, the support vector machine algorithm was used to model with these wavelet coefficients. As a showcase, three standard datasets including ZD98, ZW225 and CL317 were used to access the performance of the current method. Compared with the existing prediction methods, the encouraging results through the jackknife test indicate that the current method can be helpful to annotate unknown proteins and predict their subcellular localization in the absence of experiment data.4. Current predictors for membrane protein types primarily target solubele proteins and ignore the characteristic topological domains of transmembrane proteins. In this paper, a novel method combining the discrete wavelet transform (DWT) based on the physicochemical property of residues and SVM has been proposed to predit the types of membrane protein. In compared with more recent prediction methods, the method in this paper shows a signifencant increase in prediction performance. All the results indicate that the method in this paper is powerfun for types of membrane protein preciation.All the above techniques have complete processing programs. They can be uses and spread easily.This study is supported by the National Natural Science Foundation of China and Natural Foundation of Jiangxi Province. |