Font Size: a A A

Evaluation Of The Possibility Of Virus Infection Host Cell By Sequence Alignment-Free Statistics

Posted on:2019-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:X ZangFull Text:PDF
GTID:2370330566986443Subject:Condensed matter physics
Abstract/Summary:PDF Full Text Request
Although viruses play a very important role in human health and waste degradation,they are difficult to study.Most viruses cannot be cultivated in the laboratory,and their genetic sequences are often difficult to be identified because their genes are small and evolve rapidly.With the development of next-generation sequencing technology,human beings can further explore the nature and origin of life.Millions of biological databases have sprung up and grown up like mushrooms.Through in-depth analysis of a large number of biological data,we can better understand the diversity of viral genomes,as well as the interaction between viruses and their host cells and environment,and the pathogenicity of pathogens.The study of virus infecting host cells is of great importance for understanding the function and dynamics of the microbial community.The virus relies on the molecular mechanisms of its host cells to replicate and produce progeny virus particles.The data showed that the virus and host cells have similar word patterns(K-tuple)in genetic information.The DNA sequence of the virus and the DNA sequence of its injectable host cell are often statistically scored by a word pattern higher than a random host cell score,indicating that the virus has some similarity with the DNA sequence of the infective host cell.For the similarity between virus and host cells,we can compare the relationship between the two by comparing the nucleic acid sequence between virus and host cell.Therefore,the alignment-free sequence comparison method based on K-tuple's frequency distribution has gradually reached the stage of history.The frequency distribution of the K-tuple is used to describe the sequence characteristics of the species,which provides a theoretical basis for comparison of the microbial settlements by the K-tuple algorithm.For the alignment of a colony of microorganisms,it is only represented by a frequency vector of K-tuple,but not related to the reference gene of the sequence.The comparison of two sequences is to get a score that represents the distance between two sequences by comparing the eigenvectors of K-tuple,and then to compare the similarity or dissimilarity of the two sequences.The current alignment-free sequence comparison method based on K-tuple's frequency distribution include Eu,d2 star,d2S,Hao,and Ch.In this paper,five alignment-free sequence comparison method included Eu,d2 star,d2S,Hao and Ch,are used to analyze virus recognition.Firstly,the advantages and disadvantages of the five algorithms for virus identification were compared by ROC curve and AUC histogram.In addition,the global alignment and local alignment of the five algorithms are analyzed.It is found that the d2 star and d2 S methods are better than the other three methods,which is more suitable for virus research.Then d2 star and d2 S methods were used to investigate the virus clustering.It was found that the clustering effect of d2 S on the virus was better than that of d2 star.Finally,d2 star and d2 S were used to score the DNA sequence of the virus and the DNA sequence of the host cell.By comparing the score with the obtained threshold,it can be judged whether the virus can invade the host cell,thereby finding an effective method for exploring the correlation between the virus and the host cell.Discussion of the similarity between viruses and host cells based on alignment-free sequence comparison method is relatively rare.The use of alignment-free sequence comparison based on K-tuple can accelerate metagenomic research and search for viruses that are unknown to the scientific community.It will become a new tool to explore the huge and unknown diversity of viruses on Earth.
Keywords/Search Tags:virus, host cell, d2star, d2S, K-tuple, clustering
PDF Full Text Request
Related items