Font Size: a A A

The Research Of Protein Secondary Structure Prediction Method Based On GEP And ANN

Posted on:2010-02-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C WangFull Text:PDF
GTID:1100360302974934Subject:Agricultural Electrification and Automation
Abstract/Summary:PDF Full Text Request
Protein structure prediction, with a long historic task, is still a challenge among the numerous branches of bioinformatics. Protein secondary structure prediction is a simplification and transition of the prediction of protein 3D-structures. Any new breakthrough in this research will perform an important function for determining and undersantanding the relationship between the spatial structure and the function of protein. What is more, it will be an important assistant to some application fileds, such as guiding molecular designing and biologic pharmacy.Aiming at improving the prediction accuracy, this paper has researched the model of protein secondary structure prediction based on neural network, the optimization to neural network model using gene expression programming (GEP), and coding method of amino acids.The contents and major conclusion of this paper are as follows:(1) Considering the premature convergence of GEP, two methods have been proposed to improve the performance of GEP algorithm, including dynamics mutation rate based on evolutionary effects and local research operator, through analyzing the principle of GEP. The results of typical experiment indicate that the proposed methods can overcome the contradictions between population diversity and convergence speed, and improv the performance of GEP algorithm.(2) In order to improve the evolutionary efficiency of GEP algorithm, a method, i.e., LFC method, which can calculate fitness by only linear solution, has been proposed. This method doesn't construct and release expression tree dynamically, and can calculate fitness by using a linear structure to scan chromosome twice. The results of experiment indicate that, compared with other methods, the proposed LFC method improves the performance of GEP algorithm efficiently due to upgrading evolutionary speed and simplifying the construction. Moreover, the LFC method has more obvious effect on solving complicate problems.(3) Considering the difficulty in determining the structure and the initial weights of BP network, as well as local optimization shortcoming, based on the global research and joyful evolutionary efficiency of improved GEP, a method using improved GEP to design BP network has been proposed. By optimizing the structure and initial weights of BP network, the network, with optimal structure, can learn from the optimized point, so as to shorten the time needed for learning, and improve the stability and generalization ability of BP network. It is shown by our experiment results that the method using improved GEP to optimize BP network overcomes the randomness and instability brought by the determination of the structure and initial weights of network. Moreover, the given method can not only find more simple and more effective architecture of neural network, but also has high convergence speed, and the average evolutionary generation is only 14 percent of genetic algorithm.(4) Considering the characteristics of the formation of protein secondary structure, a new coding method based on sequence arrangement and biochemical properties of amino acid has been proposed by analyzing the physicochemical properties of amino acid. This coding method consideres the influence of physicochemical environment around amino acid and adds hydrophobic value into the coding of amino acid. The experiment result shows that proposed Bin5+12 method is effective, increasing the prediction accuracy by 1.47% and 5.06% respectively compared with orthogonal coding and 5-bit coding under the same experimental conditions.(5) Considering the problem of lower prediction accuracy of protein secondary structure based on single amino acid sequence, combining neural network designed by GEP with proposed coding method, GEP-ANN model of predicting protein secondary structure has been built. This model can make full use of the information of protein primary structure and can upgrade the mapping accuracy of neural network. The given model is employed to predict 36 heterogenous protein sequences with 6122 residues in PDBSelect25, the experiment results show that protein secondary structure prediction accuracy of the proposed model comes to 69.6%, i.e. 4.86% higher than BP network model.(6) Higher prediction accuracy is an eternal pursuit of protein secondary structure prediction. In order to improve the prediction accuracy of protein secondary structure based on single network, a new prediction model based on evolving neural network and neural network ensemble, i.e., complex cascade neural network has been developed. This model applies a different approach to forming the final results of evolving neural network according to neural network ensemble technology in order to make full use of all the information contained in the whole population. The approach uses a combination method to assemble partial individuals of the last generation and forms the first level. Moreover, some improvement on the coding method of the second level by considering fully the results of the first-level have been proposed. The given model is employed to predict 36 heterogenous protein sequences with 6122 residues in PDBSelect25, the experiment result shows that the proposed network model and coding method are effective, increasing prediction accuracy of protein secondary structure to 73.02%.
Keywords/Search Tags:protein, amino acid, secondary structure prediction, neural network, gene expression programming, bioinformatics
PDF Full Text Request
Related items