Font Size: a A A

Applying And Improving Of Clustering Algorithm In Bio-molecular Evolution Area

Posted on:2015-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:G B LiFull Text:PDF
GTID:2298330452958005Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Construction of biological evolutionary tree is an important part of molecular biology, itobtains the homology relationships between these biomoleculars through comparing molecularsuch as proteins and DNA, then using this kind of homology relationships to constructsequence-tree of species’ evolution, namely, phylogenetic tree. Research of biological evolutiontree remains a long history, which has a huge influence on biomolecular’ developments.Biological evolution tree makes it possible for mankind to take a direct look at the evolutionarysequence with the most possibility, which can be used to plainly explain Darwin’ evolution ofbiology. Since its division method is different from each other, bio-evolution tree has acollection of construction ways. This paper divides this collection of construction methods intotwo parts on whether a method is based on a distance matrix. Methods based on distanceusually have a higher efficiency in bio-evolution tree’ construction, and the obtained tree isreliable after verification, this paper centers on the research of evolution-trees based on distancematrix.Traditional methods of evolution-tree construction based on distance includes NJ,UPGMAand so on, of which UPGMA has an assumption that evolving rate of each creature is equal,which is not valid in the real world. When constructing bio-evolution tree, NJ method not onlyconsiders the evolving rate between creatures, but also ensures that each division group’s sumof distance is the least, which makes it a commonly used bio-evolution tree constructionmethod. This paper improves some disadvantages of NJ method, which makes it fit for moresituations.Three aspects of original NJ tree construction method are improved:(1)originalJukes-Cantor distance measure method is improved, here introducing mark-matrix to marksequence alignment results, which not only considers nucleotide variation in information point,but also mark DNA bases’ diversity in non-information points to fully take use of DNAsequences’ heredity information. Pair-alignments between DNA sequences is changed tomultiple-alignments to take advantage of hidden information of DNA, further reducing thepossibility of constructing an unreasonable evolution tree;(2)For distance matrix not meet theadditivity of distance, here introducing variance and covariance to help judge theto-be-classified two units. Improved distance evolution method can lift the accuracy of distanceestimation when missing some distance information, this enhances robustness of distanceestimation method and makes it adapt to circumstances with bigger noises, for this new methodhas some degree of anti-interference ability;(3)Each iteration of greedy-algorithm tries to find adistance-least classified pair, which is a local optimum method needs to be improved. Byintroducing “domain” pair, suboptimal solution produced in each iteration can be used, whichmakes later evolution tree possible to meet the requirement of global optimal. Experiments show that algorithm improving the greedy trait has the most possibility of obtaining globaloptimal evolution tree and it reduces time-complexity.This paper also introduces a new evolution tree construction algorithm based onhierarchical cluster and another algorithm based on Kmeans cluster. Experiments confirm thatthese two kinds of evolution construction methods are reliable and time cost is close to otherdistance-based evolution tree construction methods.Distance matrix method based on hierarchical clustering firstly computes the evolutiondistance between creatures by sequence alignments, then it classifies the bio-sample withhierarchical clustering and ultimately transforms the classification results into binary-tree.Bio-evolution tree construction algorithm based on Kmeans clustering computes thedistance among the sample, then classifying the collection with Kmeans algorithm into manycategories, which is used for bio-evolution tree’s construction by Hoffman algorithm, at last, bycombining every child-tree into a complete binary-tree, a bio-evolution tree is found.Besides, this paper creatively introduces Two-step cluster in Bayes method for treeconstruction, which pre-processes the DNA sequence before constructing the evolutionary tree.The accuracy of this method is validated by comparing it with other credible evolutionary treeconstruction methods.As species’ real evolution history is obtained only through researching on fossils, manyspecies’ evolution history can be validated only by biologists’ experiments, a unique standardfor assessing bio-evolution tree is not close at hand. Generally speaking, when a topologystructure is similar with an already validated bio-evolution tree by other methods, this topologytree’s construction method is deemed effective.
Keywords/Search Tags:evolution-tree’s construction, hierarchical clustering, distance matrix, Kmeanscluster, Bayes method
PDF Full Text Request
Related items