Font Size: a A A

A Study On Protein Contact Map Prediction Based On Clonal Selection Algorithm

Posted on:2007-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:R X WangFull Text:PDF
GTID:2178360182496155Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the progress of the Genome Sequencing Program since 1995, lifeSciences has come into an era of information explosion. The sharp increase ofthe biological information urges the batch processing methods by computer,which leads to the birth of the Bioinformatics. Currently, the main researchfield of the bioinformatics now is gene regulation and the study of proteinstructure and function, and protein structure prediction is the preliminary stepof the latter work. In which secondary structure prediction has been brought tomaturity, whereas the 3D-structure prediction of protein is still at its early stageand needs further investigation.There're various representations of the 3D-structure of a protein, and thedistance matrix of amino acid residues representation is commonly employed.A residue distance matrix is called a protein residue contact map if it has onlyones and zeros, in which the element "one" represents the distance between thetwo amino acid residues is smaller than the given threshold, and "zero" meansthe opposite. In brief, a protein residue contact map is always called contactmap. Contact map is the simple figure expression of a protein 3D-structure;therefore, the contact map prediction is an intermediate step to the protein3D-structure prediction.The present protein structure prediction methods can be simply classifiedas ab initio prediction based on minimal energy principle and the way ofprotein correlative information learning. Each of them has its preponderancesand shortcomings: the energy minimization method is more adaptive and highlyindependent, but it is hard to formulate the energy function. Even if acomparatively precise energy function is made, the grand compute scale causedby numerous parameters and the tiny energy difference between the formationswhich is only on the level of 1kcal/mol,make the prediction difficult. Theprediction using correlative information is more precise, especially for thehomological proteins, but it is extremely restricted by the known proteinstructure database, and is less universal. The present thesis suggests a proteincontact map prediction method employing protein folding rules and clonalselection algorithm, which has removed the limit of the present proteinstructure database by inducing the independent constraint rules from the contactmaps' characteristics, and gets a satisfactory precision.Immune algorithm is a rising algorithm which simulates the organismimmune system by computer. There is a kind of immune algorithm namedclonal selection algorithm, which is widely used due to its adaptability, implicitparallelism and diversity. Clonal selection algorithm is generated by simulatingthe antibody producing model. In the immune system, each antibody is clonedat a speed based on its affinity to the entered antigen, and then mutates at a highfrequency to generate a more adaptive antibody, which finally lead to theoptimum solution. Thus the fitness of the clonal selection algorithm shows thisaffinity between antibody and antigen. A fitness function is constructed in thispaper by using protein folding restrictions, such as:1. Amino acids' hydrophobicity ruleThe hydrophobic interaction inclines the hydrophobic amino acids to foldinside the protein, because such conformation has a less free energy. This rulemakes a higher probability to find contacts between hydrophobic residues.Hydrophobic cluster concept is induced to simplify the energy model ofhydrophobicity, which transforms the 3D model to a 2D lattice model.2. Secondary structure folding rules of proteinDifferent kind of amino acids has different secondary structure inclination,and protein segments always fold into the secondary structure of the biggestinclination.3. Amount of the contacts in contact mapAccording to the contact maps, it could be concluded that: the amount ofthe contacts in a contact map is strictly proportional to the sequence length, so acontact map with too many or too few contacts is inconsistent with the reality.4. Degree of vertexBy taking the amino acid residues as the vertices of an undirected graphicand the contacts as its edges, the researcher discovered that a rule of degree ofvertex in this graph can be found by statistics. Each secondary structurerespectively has its own degree, for example, a residue in alpha-helices has adegree of 1 or 2,whereas a residue in a beta-sheet always has a degree of 2 or 3.5. Other special rulesThere are other special rules. For instance, cysteine is easily to form adisulphide bond to fix the beta-sheets and beta-turn. There are also rules causedby the regular arrangement of the hydrophobicity.Given the midway solution generated by the clonal selection algorithmpenalty which subjects to the restrictions above, the more it breaks the rules,the less feasible it is for the real world, and the more penalty it will get, thus itwill have a higher probability of mutation in order to produce a new solutionmore accordant to the protein biological characteristics in next iteration, whichactually optimized the prediction.The testing of the prediction of 200 non-homological protein in 4 groupsof different lengths shows that, this algorithm has good adaptability and highefficiency, and the average precision and coverage of each group is higher than40% and 35% respectively. Moreover, the precision and coverage differencesbetween groups are less than 4%. Although the results of tests differ a lot at thethresholds from 6 to 10 angstroms, their mean precision is still greater than35%. Meanwhile, the execution time of a contact map prediction is not morethan 2 minutes, with a mean value about 100 seconds.
Keywords/Search Tags:Prediction
PDF Full Text Request
Related items