| Protein, as an important class of biological macromolecules and the main bearer of life activities, occupies a special position in the living-beings. Protein must correctly fold into a particular native state structure to perform its biological function. It is one of the core issues to understand the mechanism of protein folding in biological physics today. It not only has an important scientific significance, but also has a great application value in medical and biological engineering fields.Revealing the mechanism of protein folding is a very challenging task, one of the key steps is to find the determinants of protein folding rates. Thus far, many parameters and methods have been proposed successively. However, their prediction accuracies are based on the small data sets, and have the strong data dependence. Furthermore, the sequence order information, the influences of the interaction between amino acids and the sequence coupling effects have never been considered. Inclusion of that information in the prediction algorithm, both improving the prediction quality and revealing interesting insights into the folding process would become reality. Therefore, this thesis proposes several methods to predict protein folding rate from amino acid sequence, and analyzes the mechanism of protein folding. The main results of this thesis are as follows:1, the prediction method based on the genetic algorithm and neural network. In order to obtain the sequence order information, the Gaussian weighting function is used to preprocess the encoded amino acid sequence. To avoid the prediction algorithm be trapped in a local optimum, the genetic algorithm is used to optimize the initial weights and bias of neural network. Based on the Jackknife test, the correlation coefficient between the experimental and predicted folding rates is 0.80, and the standard error is 2.65. The comparative results show that the performance of the proposed method is better than several sequence-based methods. This method takes full advantage of the location information of amino acid residues, and the results suggest that the sequence order information affect the protein folding rates to a certain extent.2, the prediction method based on the pseudo-amino acid composition. In order to improve the prediction accuracy, and search and analyze the determinants of folding rates, the concept of Pseudo-amino acid composition is used for predicting the protein folding rates in the first time. PseAAC can be used to represent a protein sequence with a discrete model without completely discarding the sequence order information. The correlation-based feature selection process is used to weed out redundant information. A linear regression model is set to predict the folding rates of 99 nonhomologous proteins. Using the jackknife test, the correlation coefficient reaches 0.81 and the standard error is 2.46.3, the prediction algorithm based on the n-order coupled composition. Takeing into account the interaction between amino acid residues, the n-order coupled composition is used in the field of the folding rates prediction in the first time. The n-order coupled composition not only includes the main feature of traditional amino acid composition, but also the information of the interaction between amino acid residues. Limited by the size of dataset, the one-order coupled composition is used to represent a protein sequence. The effective features are extracted according to the correlation coefficients between the feature factors and the folding rates, which are analyzed via a linear regression model. Using the Jackknife test, the final correlation coefficient is 0.88 and the standard error is 2.04. The method further confirmed that the sequence order information and the interaction between amino acid residues are the important determinants of protein folding rates.4, the prediction algorithm based on Monte Carlo method. The interaction between amino acids would make the correlation-based feature selection method be affected by some subjective conditions. It is a more objective way by using the Monte Carlo method to select the optimal factors. Considering the computational problem, the protein sequence is represented by the pseudo amino acid, and the protein folding rate is predicted by the singular value decomposition model. Based on the Jackknife test, the correlation coefficient between the experimental and predicted folding rates is 0.83, and the standard error only is 2.39. Compared with the former subjective method, the proposed method has a significantly improved both on the prediction accuracy and the standard error.5, a user-friendly web server is developed for the convenience of the use of the proposed methods. As long as the protein sequence was submitted, the user can directly obtained the protein folding rate through the web server. The tedious modeling was omitted completely. By the user-friendly web server, the users are convenient to verify, compare and academic exchange the proposed method.All the methods can predict protein folding rates from amino acid sequences without any structural information. They represent the sequence from the different views and establish the feature vectors by the different feature extraction algorithms. Based on the different datasets, all the predictors made the good performances using the Jackknife test. These results show that the sequence order information and the interactions between amino acid residues are the important determinants of protein folding rates. |