Font Size: a A A

Research On Analysis And Evolution Model Of Biological Sequences

Posted on:2013-01-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L JieFull Text:PDF
GTID:1110330374968802Subject:Animal breeding and genetics and breeding
Abstract/Summary:PDF Full Text Request
Beginning from Darwin's era, it has become a dream for many biologists to reconstructthe evolution history of all the species in the world and describe the history by usingphylogenetic tree. The data resources of biological science are rapidly expanding with thedevelopment of molecular biology and biological technology. People began to detect theevolution history of living beings by biological data instead of phenotype, and built molecularsystematics. Biological sequence and sequence evolution models are the key components ofmolecular systematics. The traditional method for biological analysis is sequence alignment,as its complement, over20years ago, the free-alignment method emerged, which has becomea hot issue of computational molecular biology. This dissertation chose biological sequencesas a research question, put forward some new method to analyze biological sequence, andstudied the model of biological sequences evolution process to provide some new results forimprovement evolution distance. The main content includes several aspects as follows:1.New graphical representations of protein sequence were put forward. In the firstgraphical representation method,20amino acids were divided into three groups according totheir hydropathy and the new amino acid sequence was analyzed. Three curves were defined,namely, IA curve, EA curve and IE curve. The three curves can not only make new sequencevisible, but also compare the distribution of three types of amino acids. To quantify thesequence, we first introduced conditional probability as numerical characterization forsequence so as to probe into the related information hidden in the sequence. Thesimilarity/dissimilarity between protein sequences were analyzed by basing on conditionalprobability. The second method directly utilized the hydropathy of amino acids and definedthe hydropathy curve of amino acid sequence, and analyzed the similarity between differentsequences by employing traditional diagram quantitative analysis. Using ND6(NADHdehydrogenase subunit6) protein sequence as biological data, the paper carried comparativeanalysis on the two methods, which indicated that the two methods are feasible and effective.2.The selection evolution distance formula of protein sequence was proposed. Under thetheme of selection,the dynamic equation of amino acid sequences was built, and the selectionevolution distance was obtained. The parameter of the model was given. The amino acid sequences of cytochrome b in17species were taken as an example to illustrate the newevolution distance, and the different evolution trees based on different evolution distanceswere compared by the bootstrap. The result showed that the topology of the evolution treebased on selection evolution distance was consistent with that based on other distances, andthe estimation of selection evolution distance avoided the choice among different amino acidsubstitution models.3.The indexes to quantify nucleotide distribution of DNA sequences were proposed. Thedistance distribution models of4single nucleotides and16dual nucleotides in the DNAsequence were provided. Based on this, the average distance and relative distance entropy ofthem were also put forward to test the distribution of each single/dual nucleotide in DNAsequence. The average distance described the number of other nucleotides between theneighboring two single/dual nucleotide, while the relative distance entropy described thedegree of evenness for single/dual nucleotide's distance distribution. As an application, weanalyzed the nucleotide distribution of17species' mitochondrial genome sequences. Inmitochondrial genome sequences, the neighboring two nucleotide G's average distanceentropy and relative distance entropy were greater than that of other nucleotides, and dualnucleotide CG's average distance was larger than that of other dual nucleotides.4.The conversion-transversion evolutionary model of DNA sequences was built. On thebasis of nucleotide substitution dynamic model, this paper analyzed the pattern of changes forfrequency of nucleotide sequences' similar pair, conversion pair and transversion pair withchanges of the time. The characteristics of transversion-conversion ratio and evolutionarydistances were also analyzed. The conversion-transversion evolutionary model was putforward. The method for estimating parameters in the model and its biological significancewere also provided. Compared with the traditional dynamical model of nucleotide substitution,the establishment of this model is simpler, which can be obtained directly by the nucleotidesubstitution matrix or nucleotide substitution path map. Through this model, the DNAsequence's evolutionary distance was easily obtained.
Keywords/Search Tags:DNA sequence, protein sequence, free-alignment method, evolutionarydistance, evolutionary model
PDF Full Text Request
Related items