Font Size: a A A

Strategies For Screening Unknown Viral Pathogens By Using High-throughput Sequencing Technology

Posted on:2015-09-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:X P AnFull Text:PDF
GTID:1224330431473890Subject:Microbiology
Abstract/Summary:PDF Full Text Request
Fast screening of pathogens of emerging infectious diseases is one of the keys toearly diagnosis and efficient prevention of the diseases. High throughput sequencingis the most powerful tool for identification of unknown infectious agents. Traditionalhigh throughput sequencing for pathogen identification is carried out on extractingtotal nucleic acid of the suspected samples, and sequencing the samples including thepathogen nucleic acid and a lot of background of host gDNA and rRNA. thesebackground nucleic acids takes up a lot of sequencing resources, which seriouslyreduces the amount of pathogens data and hamper the pathogen screening at times.Moreover, the traditional high-throughput sequencing requires the samples of highquality and high quantity, and in many cases hard-earned precious samples wererefused by sequencing companies due to their low quality and/or low quantity. Herewe focus on the study of pre-treatment methods and amplification methods to solvethe sequencing problems, so as to screen pathogens not only from ordinary samplesbut also low quality and/or low quantity samples, and to obtain as much as possiblesequence data of pathogen for downstrain bioinformatics analysis. In this paper, wetested various techniques to handle the samples of different types, including swabsamples from patient respiratory tract, blood samples and mosquito samples from theenvironment, in an effort to screening pathogens.The chief results of this study are listed as following:For sequencing ADV genome from throat swabs, we compared sequencingeffects of Sanger sequencing and high-throughput sequencing protocols. Under theinitial judgment of ADV outbreaks, we tried to obtain the whole genome sequence bySanger sequencing and454deep sequencing. The primers were designed with ADV11a genome as reference sequence to PCR the samples and then PCR products weredirectly sequenced by Sanger sequencing. After the assembly of these sequences, weobtained the a total length of28737bp, covering82.7%of ADV whole genome(34.7kb). Parallelly, the gDNA sample of a throat swab and the PCR products of another throat swap were respectively sequenced by454unbiased deep sequencing.The results from the gDNA sample showed that only23reads matched ADV in a totalof about740,000sequencing reads, which assembled into contigs with a total lengthof6215bp, covering as little as17.9%of the ADV whole genome. More than90%ofthe output sequence data were derived from human genomic DNA, and a largenumber of normal bacteria flora such as Neisseria meningitides were identified, andno other pathogens sequences were discovered. On the contrary, the results from PCRproducts derived from swab gDNA showed that3125reads matched ADV in a total ofabout200,000sequencing reads, which were assembled to contigs with a total lengthof33070bp, covering up to95%of the ADV genome. Obviously, regardless of Sangersequencing or454deep sequencing, specific amplification of target DNA will getbetter genome coverage of the target pathogen. Demonstration of ADV sequences andno other pathogen sequences by deep sequencing in swab gDNA validated our initialdiagnosis of ADV infection.During the avian influenza H7N9outbreak, we screened AIV from a largenumber of bird samples. After real-time PCR for screening, three PCR protocols wereadopted to amplify H7N9virus sequences from the different virus concentrationsamples. A total of167samples were screened by real-time PCR and the positivesamples were subjected to sequencing by PGM. Totally thirty-three strains of AIVsequences, including full-length genome sequences of7strains, majority genomesequences of19strains, and minority genome sequences of7strains. The analysisresults revealed that some samples had both the H7N9and H9N2strains, indicatingmixed infections.Because there are too much background nucleic acid in the original samples, wetried to remove them. After the nuclease digestion of the background nucleic acid, weextracted the remanent nucleic acid, which had a very low DNA concentration. Forthe samples with only trace amount of DNA, we tried different methods ofnon-specific amplification (MDA, SISPA, anchored random PCR, see following) toincrease the quantity of nucleic acids in order to get enough amount DNA for librarypreparation.The MDA method were initially used to amplify the trace amount of DNA fromthe original samples. The DNA and cDNA (synthesis from RNA) as template were amplified, and the MDA methods were divided into single-stranded MDA anddouble-stranded MDA, depending on whether the second strand cDNA weresynthesised. After amplification and sequencing, a large number of TTV sequencewere obtained by both MDA methods. After assembly the TTV (named TTV-Hebei-1)whole genome sequence was obtained, which represented a novel genotype withsequence highly heterologous to other reported TTV stains. The complete genomesequence of a novel phage (named IME-16) was also assembled, which had very lowhomology with other reported bacteriophages. The results suggested that the MDAmethod is particularly suitable for circular single-stranded DNA amplification.The nucleic acid from HCV patient sera were respectively amplified by MDAmethod and SISPA method, in order to compare the amplification efficiency of RNAviruses. The results of MDA method showed that the reads matching HCV sequencesis extremely few (only4reads), composing only0.1%of the reads matched to virus,and cover only6%of the HCV genome. In contrast, more than99.9%of the virusreads from SISPA method matched HCV sequences. But these reads only cover28%of the HCV genome, with large amount of reads clustered in a few random region ofthe genome. The reason of low genome coverage is supposed to be excessiveamplification of the some regions of the genome sequence, which may be acharacteristics of SISPA method. Since the standard SISPA method requires twoseparate protocol to amplify DNA and RNA, to simplify this method for detection ofboth DNA and RNA viruses, we modified the method by introducing denaturationbefore the reverse-transcription, and successfully detected both HBV and HCV.We then attempted to amplify the nucleic acid by MDA method, anchoredrandom PCR method from mixed serum samples which contains HBV, HCV, TTV.The total nucleic acid of the mixed serum sample was extracted without the nucleasedigestion and without amplification as control. The results show that the controloutput1.34million reads, of which a large proportion were human sequences, andonly a small proportion are HBV sequence (61reads), or HCV sequence (1reads),while no TTV sequences were identified. More TTV sequences were obtained bysingle-stranded MDA method and more HBV sequences by both double-strandedMDA method and anchored random PCR. The reasons of bias in sequencepresentation may be due to predilection of different amplification methods. The infectious agents carried by mosquitoes often caused a variety of infectiousdisease outbreaks. In order to screen the pathogens, a new strategy based on the smallsilencing RNA (siRNA) immunity to virus infection was proposed to detect novelDNA and RNA viruses in our laboratory. We improved the method to enhancesequence specificity by increasing the length of sequenced RNA. At the same time,we simplified the experiment process by constructing RNA library to avoidsequencing the rRNA. With this strategy, a mosquito X virus named MXV-W3wasidentified in a mosquito pool sample, and we demonstrated that this virus can infectmosquito independent of dengue virus coinfection. After assembly,95.1%of fragmentA genome and92.8%of fragment B genome of MXV-W3were obtained,.Based upon the above discoveries, it also suggests that, to get more informationof the pathogens in the sample, it is very critical to select appropriate pretreatmentmethod and proper amplification method prior to high throughput sequencing, and theselection of pretreatment method and amplification method is also depend on thepurpose of the experiment and kinds of pathogens (i.e., DNA virus, RNA virus).
Keywords/Search Tags:High throughput Sequencing, MDA, SISPA, Anchored random PCR, Adenovirus, Avian influenza virus
PDF Full Text Request
Related items