Font Size: a A A

Evolutionary Analysis Of Viral Genome Fragments

Posted on:2022-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2480306329488944Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Genes are carriers of vital information for the growth and development of living organisms in nature in all kinds of life forms.As the blueprint of life,genetic information restricts and regulates organisms to maintain the survival and reproduction of life.Genomes will undergo various mutations and recombinations due to various factors in the procession of living.These mutations and recombinations will change the information in genes more or less.Gene mutation is a random process with no clear directions.Every base in gene has three different choices to mutate into another base.As sequence length increases,the mutation space grows exponentially.From a mathematical point of view,all biological genomes nowadays are contained in low-dimensional manifolds in a high-dimensional space.If we could find out some properties of this low-dimensional manifold through computational methods,we will have a better understanding of the evolutionary relationship about the huge and complicated biological genomes and relationships of species evolutions.With the continuous progress of sequencing technology,the sequencing data of various species show a blowout growth.People's understanding of gene code is more and more profound and attempt to understand the information through various methods and find traces of species evolution from the massive sequencing data.Those tries made people come to understand the growth and evolution of species.However,there are still many problems,like insufficient computing resources due to data growth,or the range of calculations are too small to observe higher correlation between genes,or accuracy of the calculation has to be reduced when data scale enlarges and wasting time and computing resources by doing small-scale and repetitive calculations.In this article,we use the manifold evolutionary graph algorithm to analyze evolutionary relationships among a large number of genomic fragments,whose basic principle is that split each genome sequence in the genome database into short fragments,then we make every fragments mutate,and search for same or similar sequences in the database,then the original sequence will be connected to target sequence in a graph.When searching for short fragments and clustering them we use component rank vector code(CRV code),ranging word in short fragments by their frequency,coding them by the order and frequency of word and save as multiple tree.By the time we need to search,the first thing we search for will be code of target mutated short fragment,after that we will find the target fragment.The manifold evolutionary graph algorithm mentioned in this article can find evolutionary correlation in a large number of genomes and deal with dataset consisted of different species such as bacteria and viruses.By this way,we can observe evolutionary relationships between genome data at more range.In this research,we have carried out a preliminary introduction and practice of CRV coding and manifold evolutionary graph algorithm in a small dataset.We hope to verify the effectiveness of the algorithm through preliminary results.Thus it can provides a firm basis for the further development of more methods related.
Keywords/Search Tags:Bioinformatics, viral genome, multiple sequence alignment, evolutionary biology
PDF Full Text Request
Related items