Font Size: a A A

Research On Prediction Of Protein Structures Integrated Computational Intelligence

Posted on:2012-08-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:1110330362954300Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Proteomics is becoming an important research domain in the life science with the approach of post-genome area. Prediction of protein structures research takes a significant role in the whole proteomics plan. The content of protein structures prediction research includes sequences preprocessing, protein secondary structures prediction, protein supersecondary structures prediction, protein contact maps prediction, protein 3D structures prediction, etc. This paper made an intensive study on sequences preprocessing, protein secondary structures prediction and protein contact maps prediction.Protein sequences are translated from DNA sequences, so the quality of DNA sequences is an important factor to prediction accuracy of protein structures. Existing DNA sequences preprocessing tools are still not efficient in noise segments filtering and cleaning. The probability of error will increase significantly with the increasement of the length of DNA.Thus, this paper made research in DNA sequences preprocessing.BP neural networks have been widely used in protein secondary structures prediction, but they have some defects, such as slow convergence speed and local optimum traps. These defects influence the accuracy of protein structures prediction and need to be improved. Meanwhile, existing available methods for protein secondary structures prediction are limited on feature representation. Only basic compositon of amino acids is considerd in these methods as a result they are incapable of representing necessary information completely. The hydrophobicity of amino acids and interaction between amino acids which are far away from each other have been ignored.In this paper,an improved classification method for protein secondary structures predition based on more complete feature representation need to be furtherly explored.The 3D structures of proteins are tightly associated with specific functions. Nowadays, it is very difficult to predict the 3D structures from the secondary protein sequences. Protein contact maps are possible connecting ties between 3D structures and secondary structures. There is thus a need to predict the contact maps of proteins.The main contributions of this paper are summarized as follows:First of all, a novel DNA preprocessing method merging intelligent detection is proposed. This approach finds and locates contaminants automatically using statistical methods, random search and graph-theoretic operations but with no extra background information such as vector sequences, splice sites and clone adapters. This new method can be applied in the DNA data processing pipe as an independent component tool.Secondly, an improved dynamic tunneling neural network algorithm, which is applied in protein secondary structures prediction, has been proposed. Neural networks suffer from a defect of easily immersing in local traps. The dynamic tunneling technique helps neural networks to eliminate the local traps by"tunneling"and jumping into lower valleys of object function. However, the traditional dynamic tunneling technique tries to search in a random and single direction, thus it is instable. In order to improve the searching efficiency, an improved dynamic tunneling neural network algorithm has been proposed to enhance the stability by increasing the directions of tunneling and controlling the interaction between trajectories of the tunneling system with an angle spring coefficient. Experimental results show that the improved algorithm outperforms both the traditional neural network and the traditional dynamic tunneling neural network in the prediction of protein secondary structures.Thirdly, comparative experiments, which test the influence of the amino acid hydrophobic property and the interaction between far away amino acids in protein secondary structures prediction, have been implemented. Existing machine learning based protein secondary structures prediction methods suffer from low prediction accuracy because they ignore the amino acid hydrophobic property and the interaction between far away amino acids. A sequence of hydrophobic value can be built by replacing the amino acid by its hydrophobic energy value. Experiments show that the BP neural network using long amino hydrophobic energy value sequences works well in prediction of E structure (β-strand) which is controlled mainly by long amino acid-amino acid interaction.Fourthly, this paper proposes a Co-training algorithm based on different protein features. The comparative experiments show that the long amino acid-amino acid interaction plays a significant role on predicting E structure (β-strand). Therefore, a Co-training algorithm is explored which is based on both the profile space and the hydrophobic energy value space. They are sufficient and redundant views. In the proposed algorithm, there are two classifiers. One is the SVM classifier trained in the profile space, and the other is the BP neural network classifier trained in the hydrophobic energy value space, and they predict one amino acid's secondary structure independently. If these two classifiers have different prediction results with one amino acid, an arbitration rule proposed in this paper is employed to make the final decision which is based on an active selecting strategy according to the two classifiers'different priority levels. The experimental results show that the proposed algorithm has higher prediction accuracy both in E structure (β-strand) which controlled mainly by the long interaction and H structure (α-helix) which controlled mainly by the short interaction than existing algorithms.Fifthly, Markov Logic Networks are applied in protein contact maps prediction first time. Markov Logic Networks (MLNs) are new Statistical Relational Learning models in which Markov networks and first-order logic are combined together. They are able to compute the probability distribution of worlds and serve for the inference. In this paper, we introduce the theory, learning methods and inference algorithms of Markov Logic Networks and then apply them to the protein contact maps prediction. This research adopts discriminative learning algorithm for Markov Logic Networks weights learning, MC-SAT algorithm for inference. This paper also shows how to capture the essential features of different aspects in protein contact maps prediction with a small number of predicate rules and how to combine these rules together to compose different models. It is proved that the method based on Markov Logic Networks is better than the way based on conventional neural networks in protein contact maps prediction by experimental results.This research provide a new solution for such kind of practical prediction problems.
Keywords/Search Tags:DNA sequence, protein structure, neural network, co-training, Markov Logic Networks
PDF Full Text Request
Related items