Font Size: a A A

Establishment And Application Of A Sequencing-platform-wide Computational Pipeline For Metagenomic Virus Identification

Posted on:2022-01-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:1480306344971549Subject:Pathogen Biology
Abstract/Summary:PDF Full Text Request
Viruses could result in not only infectious disease,but also the public health emergency.Despite significant advancements in our understanding of pathogen behind infectious disease,existing diagnostic assay often fail to identify etiologic pathogens in cases of unknown or suspected infections.Currently,the most widely used methods for pathogen identification are molecular diagnostic techniques,which target pathogen-specific DNA or RNA fragments,such as polymerase chain reaction(PCR).Noting that these methods require a priori knowledge of the candidate pathogen.However,viruses that cause public health emergency are often unknown,posing a great challenge to existing molecular diagnostic techniques.Maturing metagenomics Next Generation Sequencing(mNGS),which facilitates universal pathogen identification through comprehensive analysis of microbial and host genetic material(DNA and/or RNA)in samples from patients,is rapidly moving from the research to practice.Compared to bacteria,fungi and other pathogens,most viral pathogens are in the form of RNA.mNGS for viruses are therefore known as Viral mNGS(Viral mNGS).Although the recent successes of mNGS,fast and accurate interpretations of findings are not only challenging,but also considered as a research focus.To address the need for improved Viral mNGS interpretations on the virus and host data,this study included:Chapter 1:Development of the Virus Identification Pipeline 2(VIP2).VIP2 allowed comprehensive analysis of viral reads from universal sequencing platforms through k-mer based classifiers.VIP2 achieved better performance in both speed and accuracy than VIP and competitors.VIP2 has not only been rigorously tested across multiple clinical sample types representing a variety of infectious diseases,but also contributed to the first 3 complete genomes of 2019-nCoV(EPIISL402119,EPIISL402120,EPIISL402121).In addition,VIP2 was recognized by independent users.Certifications of using VIP2 were received from not only domestic institutes,including Beijing,Shanghai CDCs and so on,but also international users,such as French National Institute for Agriculture Research,Nigerian Institute of Medical Research etc.Chapter 2:Comparison of third-generation sequencing approaches to identify viral pathogens under public health emergency conditions.The capability of Viral mNGS for detection of known and unknown viruses timely makes it a powerful tool for public health emergency response.Emerging third-generation sequencing(TGS)offers advantages in speed and length of detection over second-generation sequencing(SGS).Here,we presented the end-to-end workflows for both Oxford Nanopore MinION and Pacbio Sequel on a viral disease emergency event,along with Ion Torrent PGM as a reference.With help from VIP2,all the three platforms successfully identified and recovered at least 85%Norovirus G?genomes.Oxford Nanopore MinION spent the least sample-to-answer turnaround time with relatively low but enough accuracy for taxonomy classification.Pacbio Sequel recovered the most accurate viral genome,while spending the longest time.Overall.Nanopore metagenomics can rapidly characterize viruses,and Pacbio Sequel can accurately recover viruses.Chapter 3:Integrating host response and metagenomic detection in cerebrospinal fluid from unexplained encephalitis cases.Diagnosis of unexplained infections played a key role in clinical treatment.We applied both random amplification and virus sequence independent targeted amplification(VSITA)over cerebrospinal fluid from five patients with unexplained encephalitis,followed by interpretations of both viruses and host response.Confidence reads of cytomegalovirus(CMV)were detected.Quantitative Reverse Transcription PCR(qRT-PCR)validated this finding(Ct values:30.23,32.83,34.08,respectively).Enrichment analysis over the differential expressed genes suggested upregulation of the pathway,Human cytomegalovirus infection(P=0.213).Finally,the presence of CMV was responsible for the unexplained encephalitis cases.Chapter 4:OASL as a diagnostic marker for influenza infection was revealed by integrative bioinformatics analysis with XGBoost.Acute respiratory infection(ARI)caused by either viruses or bacteria is one of the most common reasons for people to seek medical care.Annually,influenza is responsible for one out of five people suffering from ARI.Host response biomarkers offer a promising alternative diagnostic solution to distinguish patients suffered ARI infected by influenza.However,panel with multiple genes published was problematic in clinical practice.One important aspect that will facilitate its clinical implementation would be to simplify these assays:1)by reducing the number of biomarkers required for an optimal diagnosis and 2)by using molecular platforms that are easier to operate in the clinical laboratories with a faster turnaround time.The presented study addressed these major challenges.OASL,a single biomarker for influenza diagnostic,was revealed and evaluated on existing publicly available datasets by integrative bioinformatics analysis with machine learning algorithm.In addition,the expression of OASL,along with universal influenza detection,could be measured by the widespread real-time RT-PCR technologies in the clinical setting.Therefore,it is now possible to develop a prospective validation study in a variety of clinical settings to definitively determine the test's accuracy.Chapter 5:Abnormal upregulation of cardiovascular disease biomarker PLA2G7 induced by proinflammatory macrophages in COVID-19 patients.High rate of cardiovascular disease(CVD)has been reported among patients with coronavirus disease 2019(COVID-19).Importantly,CVD,as one of the comorbidities,could also increase the risks of the severity of COVID-19.Here we identified phospholipase A2 group VII(PLA2G7),a well-studied CVD biomarker,as a hub gene in COVID-19 though an integrated hypothesis-free genomic analysis on nasal swabs(n=486)from patients with COVID-19.PLA2G7 was further found to be predominantly expressed by proinflammatory macrophages in lungs emerging with progression of COVID-19.In the validation stage,RNA level of PLA2G7 was identified in nasal swabs from both COVID-19 and pneumonia patients,other than health individuals.The positive rate of PLA2G7 were correlated with not only viral loads but also severity of pneumonia in non-COVID-19 patients.Serum protein levels of PLA2G7 were found to be elevated and beyond the normal limit in COVID-19 patients,especially among those re-positive patients.We identified and validated PLA2G7,a biomarker for CVD,was abnormally enhanced in COVID-19 at both nucleotide and protein aspects.These findings provided indications into the prevalence of cardiovascular involvements seen in patients with COVID-19.PLA2G7 could be a potential prognostic and therapeutic target in COVID-19.In summary,this study developed a combined analysis over host response and viral metagenomic identification of pathogen to evaluate the viral infections.The applications of Viral mNGS including public health emergency,pathogen identification from unexplained infections and discovery of host signatures to viral infections provided insights into diagnosis,surveillance,and treatment.
Keywords/Search Tags:Metagenomics, Viral Infections, Host response, Bioinformatics
PDF Full Text Request
Related items