Font Size: a A A

Host Genome Depletion And Pathogen Genome Enrichment Technology Based On Nanopore Sequencing

Posted on:2024-06-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F LinFull Text:PDF
GTID:1524307094476394Subject:Pathogen Biology
Abstract/Summary:PDF Full Text Request
BackgroundInfectious diseases are key factors affecting human health.Rapid and accurate identification of pathogens is an important prerequisite for diagnosis and treatment of infectious diseases.Traditional pathogen detection methods are time-consuming and have a low positive rate,making it difficult to detect new emerging or rare pathogens.Nanopore metagenomic sequencing has become a useful method of pathogen diagnosis of infectious diseases in recent years.Compared with the next-generation sequencing platforms,it has the advantages of portability,real-time sequencing,long read length and so on.However,the low data throughput of nanopore sequencing and the high host nucleic acid background in clinical samples greatly affect the efficiency of pathogen detection.In order to enrich the microbial genome sequences in the samples and increase the sensitivity of pathogen detection,a series of methods have been developed,such as multiplex PCR amplification,probe hybrid capture and selective lysis of host cells.However,these methods are generally only applicable to specific types of samples or pathogens,and most of them are based on the next-generation sequencing platforms,thus lacking a universal microbial enrichment method suitable for long read sequencing.The nanopore real-time sequencing platform provides an "adaptive sequencing" function,that is,real-time judgment can be made when the sequence passes through the nanopore,allowing the targeted sequences to continue sequencing,while the untargeted sequences can be ejected by reverse the voltage across the nanopore and no longer sequenced,thus achieving the enrichment of the target sequences.This provides a new strategy for microbial genome enrichment based on nanopore metagenomic sequencing.ObjectivesThis study aims at the problem of identifying low abundance pathogens in clinical samples by metagenomic sequencing.Based on nanopore adaptive sequencing technology,two strategies including reverse removal of host genome and forward enrichment of microbial genome are used for unknown and known target microorganisms,respectively.The study aims to establish a universal nanopore metagenomic sequencing method for microbial genome enrichment to improve the efficiency of nanopore metagenomic sequencing for pathogen detection,and provide technical support for rapid detection of pathogens in response to infectious diseases.MethodsFor infected samples with unknown pathogen,a strategy of reverse host genome removal is adopted.Targeted probes are designed for selective binding the repetitive sequences on the human genome,and the smallest set of probes were synthesized to capture and remove human nucleic acids,thus increasing the proportion of microbial nucleic acids.The method was established using mock samples.Clinical samples with clear background information were used to perform the probe method to remove the host sequences in the samples.Then the samples were performed adaptive sequencing,using the human genome as the reference to eject human reads.The enrichment effect of the two methods was evaluated by calculating the bacterial and target pathogen reads,as well as the number of detected species and antimicrobial resistance genes.For infected samples of known pathogens,a strategy of forward enrichment of microbial genome is adopted.Ligation-based and rapid-PCR based nanopore adaptive sequencing workflow were established to evaluate the enrichment effect of different library preparation workflows.The genome of known pathogens was used as a reference database to enrich the pathogen sequences by adaptive sequencing.Furthermore,an enrichment workflow for single or multiple pathogen genomes was established,and a virus database was constructed based on NCBI Ref Seq database.Samples of known pathogens and suspected virus infections were used to evaluate the enrichment effect of targeted pathogens and the timeliness of obtaining the genome sequence.ResultsFirstly,a total of 451 targeted probes based on human repetitive sequences have been designed,and a workflow for probe targeted removal of host sequences has been established.Lambda bacteriophage and microbial community mock samples were constructed,and it was found that the optimal enrichment effect for microbial sequences was achieved when the probe input was 18 pmol.After treatment with probe workflow,the number of microbial reads in the sample increased by 24.80% to 61.06%,and the number of microbial bases increased by 31.83% to 75.73%.The vast majority of the reads captured by the probe are human derived(99.88%~99.97%),with only 0.01%~0.09% of the reads aligning to microorganisms.After the treatment of probe workflow,the sequencing coverage of the microbial genomes has good uniformity.Secondly,probe workflow and adaptive sequencing was combined to remove human genome in clinical samples with clear background information.It was found that the probe workflow increased the proportion of bacterial reads in the samples by an average of 29.46%.The number of target pathogen reads in the probe workflow conbined adaptive sequencing group,adaptive sequencing group and probe workflow group were 3.10,2.38,and 1.31 times compared with the conventional sequencing group.The number of reads aligned to the antimicrobial resistance genes in the probe workflow combined adaptive sequencing group,adaptive sequencing group and probe workflow group were 2.99,2.22,and 1.37 times compared with the conventional sequencing group.The probe workflow conbined adaptive sequencing group can detect more species and antimicrobial resistance genes at the same sequencing time,while also improving the coverage of antimicrobial resistance genes,greatly improving the enrichment effect of microbial genomes.Thirdly,ligation-based nanopore adaptive sequencing(LNAS)and rapid-PCR based nanopore adaptive sequencing(RPNAS)workflows were established.The enrichment effect of the two workflows were evaluated using clinical samples with clear background information.The LNAS workflow can effectively identify the target pathogen sequences in the samples,but fail to increase the data throughput of the targeted pathogen due to the short average fragment length of the library(<1000 bp).The RPNAS workflow improves the average fragment length of the library(>3000 bp)through long fragment PCR and magnetic bead fragment filtering.Even though adaptive sequencing leads to a decrease in total data throughput and an acceleration in the rate of nanopore inactivation,it still improves the relative abundance(7.87~12.86 times)and effective data throughput(1.27~2.15 times)of the target pathogen compared to the control group.Fourthly,adaptive sequencing for multi-pathogen enrichment has been established.The single pathogen enrichment for SARS-CoV-2 was performed based on the RPNAS workflow.The multi-pathogen database containing all known viruses has been established based on the Ref Seq virus database.In the same time,the base number and relative abundance of SARS-CoV-2 in the adaptive sequencing group increased 1.52~2.90 times and 4.02~9.35 times compared with the control group,respectively,and the adaptive sequencing group required less time(3.14 hour on average)to obtain the same SARS-CoV-2 genome coverage as the control group(16.10 hour on average).For animal samples suspected of virus infection,metagenomic next-generation sequencing was firstly performed to determine the virus type of the samples,and then perform adaptive sequencing with multi-pathogen database.Compared with the control group,the adaptive sequencing group was able to obtain more total viral sequences(1.61 times on average),and the enrichment efficiency decreased with the prolongation of sequencing time.Adaptive sequencing can enrich goose parvovirus,goose circovirus,duck hepatitis B virus and avian adeno-associated virus in different samples,and obtain higher viral read number and genome sequencing depth at the same time.Compared with the control group,the adaptive sequencing group obtained the genome sequences of goose circovirus and duck hepatitis B virus in a shorter period of time.Both of the viral genomes were closely related to the virus strains from China through phylogenetic analysis.ConclusionThis study established a host genome removal and pathogen genome enrichment technology based on nanopore adaptive sequencing.For the detection of unknown pathogens from infected samples,probes were designed targeting repetitive sequences in human genome to remove human-derived nucleic acid.A method combined probe workflow and adaptive sequencing was established to remove host genome.The number of the target pathogen reads and antimicrobial resistance gene reads in the samples was increased,and more species could be detected,which is is helpful for rapid and accurate diagnosis of pathogens of clinical infectious diseases.For the detection of known pathogens,a forward pathogen enrichment strategy was used to obtain more pathogen genome information.The RPNAS workflow was established and a multi-pathogen database covering full spectrum viruses was constructed to achieve single or multi pathogen enrichment.The effective data throughput for the target pathogen can be increased in the same sequencing time,enabling the assembly of virus genome sequences with higher timeliness,and can achieve high throughput screening of multiple pathogens.The above methods effectively improve the ability of nanopore sequencing to rapidly and accurately identify clinical pathogens,and also provide important technical support for the detection of new emerging or unknown pathogens.
Keywords/Search Tags:Nanopore sequencing, Metagenomic sequencing, Adaptive sequencing, Pathogen detection, Infectious disease
PDF Full Text Request
Related items