Font Size: a A A

Construction Phylogenetic Tree From DNA Sequences Based On The Positions Of Common Prefix Identifiers

Posted on:2022-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:C LuFull Text:PDF
GTID:2480306494456494Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Since 1990,with the implementation and completion of the Human Genome Project,biological data have been growing exponentially.It is a meaningful and challenging task to mine useful biological information by analyzing and processing biological data.Bioinformatics has been proposed under this background.Similarity analysis of biological sequences is an important research content in bioinformatics.The research methods of sequence analysis can be divided into alignment methods and alignment-free methods.Because traditional alignment methods have certain limitations,alignment-free methods,which complement and develop them,appear and become one of the hotspots of molecular biology research quickly.This dissertation proposes a new alignment-free model—SPD(standardized position difference)model.The SPD model takes into account mammalian mitochondrial genomes are duplex DNA circles.First,based on the common prefix identifier,the SPD model obtains the prefix identifier of each position in the circular DNA sequence.Next,the SPD model compares the prefix sets of each pair of sequences,extracts the standardized position of each common prefix identifier in the sequence,and calculates the position difference of each common prefix identifier between sequences.Finally,the similarity between sequences is the average of the positional differences with coefficient.In this dissertation,all mammalian complete mitochondrial genome sequences are used as the experimental objects—1050 mammalian mitochondrial complete genome sequences.The experimental object was downloaded from the Gen Bank database in NCBI on December3,2020.The experimental object comes from 27 orders,129 families and 491 genera.The orders,families and genera containing at least two sequences are 22,80 and 172,respectively.The SPD model is used to calculate the similarity among the 1050 mammalian complete mitochondrial genome sequences,and then the neighbor joining method is used to construct a phylogenetic tree.The clustering accuracy rates of the obtained phylogenetic tree at the order,family,and genus levels are 77.27%,95.00%,and 76.74%,respectively.In addition,the accuracy rates of independent branching of Cetartiodactyla,Carnivora and Perissodactyla with more than 100 sequences are 100.00%.At present,no literature has been found on the evolutionary analysis of the entire mammalian mitochondrial genome sequence.Most of the articles conduct evolutionary analysis on the mitochondrial genome sequences of some mammals from each order,or the mitochondrial gene sequences of all mammals from a certain order.By comparing with related research results,it is found that the reconstruction of the phylogenetic relationship among these mammals by the SPD model is almost consistent with standard biological classification.
Keywords/Search Tags:Alignment-free Method, Phylogenetic Tree Analysis, Complete Mitochondrial Genome, Prefix Identifier
PDF Full Text Request
Related items