| With the development and wide application of high-throughput sequencing technology,the development of metagenomics research in the environment or animals has been promoted.In this study,viral metagenomics technology was applied to the study of virus diversity in various important economic shellfish along the coast of South China.These samples were divided into two batches,the first batch of samples(designated“oyster samples”)contained Crassostrea hongkongensis collected in 2015-2016 from oyster culture areas of Qinzhou(Guangxi province),Zhanjiang(Guangdong Province),Yangjiang(Guangdong Province),Zhuhai(Guangdong province),Shenzhen(Guangdong Province),and Lianjiang(Fujian Province).The second batch(designated “Daya samples”)was collected from Daya Bay in Shenzhen in June 2016.It included Ruditapes philippinarum,Scapharca subcrenata,and Chlamys farreri.A combination method of differential centrifugation and ultracentrifugation was used to extracted total virus particles were extracted.Nucleic acid was extracted after nuclease removal of host free nucleic acid.Used Multiple displacement amplification(MDA)to obtain a sufficient amount of DNA,and then established 79 high-throughput sequencing libraries,finally obtained the Illumina sequencing data is 233.9Gb(after compression,about 97 796 284reads).Using this data,we analyzed and compared different sequencing library construction methods,applied and evaluated a variety of bioinformatics analysis tools,and conducted in-depth analysis of these data.Obtained the information on the diversity of viruses in shellfish,increased the understanding of the diversity of important economic shellfish viruses along the coast of South China,and provided new ideas and references for research on shellfish viruses.The following main results were obtained:1.Comparison of methods for library construction and short read annotation.The construction and sequence annotation of metagenomic sequencing libraries is an important step in metagenomic research,which determines the quality of metagenomic research.In this study,two strategies were used to compare the strategies for the construction of virus metagenomic libraries with different strategies.Our analysis showed that WTA(whole transcriptome amplification)& WGA(whole genome amplification)method reverse-transcribing RNA into c DNA and then amplifying it simultaneously with DNA by whole genome amplification,which can obtain more nucleic acid while taking into account the research of RNA virus and increase the success rate of database building.However,the proportion of RNA viruses detected by this method is lower than that using WTA alone,which indicates that WTA & WGA method have a preference for the amplification of DNA viruses,especially ss DNA viruses.RNA virus should adopt WTA strategy for special research.Moreover,higher quality libraries were obtained by agarose gel extraction rather than with AMPure bead size selection,the agarose gel extraction should be used as the preferred protocol for library construction.However,taking into account that the magnetic beads screening method is simple,further adjustment of the filtration parameters can also obtain better sequencing results,which can be used as alternatives.Finally,we compared three annotation tools(BLAST,DIAMOND,and Taxonomer)and two reference databases(NCBI’s NR and Uniprot’s Uniref).Considering the limitations of computing resources and data transfer speed,we propose the use of DIAMOND with Uniref for annotating metagenomic short reads as its running speed can guarantee a good annotation rate.2.Study on diversity of Crassostrea hongkongensis from coastal areas in South China.The magnetic beads method was used to select DNA fragments when the oyster samples sequencing library was established.The Agilent 2100-H assay showed that the size of the fragments was not uniform,and the DNA fragments ranged from 196 bp to 1435 bp.Due to the quality of the library will inevitably affect the quality of sequencing,the Q20 for all Raw Reads obtained from sequencing is 87.18%,the Q30 is 82.12%,the GC content is44.71%,the sequencing error rate is 0.33%,and with the default parameter,the effective rate of reads is 38.00%.After re-adjusting the parameters,all Raw Reads of the oysters were re-filtered,which increased the acquisition rate of Clean Data and increased the effective utilization of reads to over 60%on average.Using DIAMOND and NR non-redundant protein database,all library annotated results showed that viral sequences accounted for 9.70%,cell type organisms accounted for 13.19%,including 6.89% of eukaryotes,5.47% of bacteria,and a few archaea.The viruses annotated in the DNA virus library are mainly uncultured marine viruses and Gogushovirinae,the RNA virus library are mainly Beihai picobrina-like virus and Beihai mollusks virus.The species and content of viruses in oysters in the same area are similar,but the types and content of viruses vary significantly among different regions.The function of the viral metagenome(encoding gene function)is also affected by regional differences.3.Analysis of virus diversity in shellfish samples from Daya BayThe method of building a library for virus metagenomics was further improved,and agarose gel extraction were used to select DNA fragments.Libraries created using this method have higher efficiency of adapter and less variation between samples.The Agilent 2100-H assay showed that the DNA fragment was concentrated at 422 bp,DNA fragment size is uniformity and has small differences between libraries.The Q20 for all reads is 92%,the Q30 is 84%,and the sequencing error rate is 0.05%,much higher than oyster library construction strategy,the GC content is 51% and the rate of Clean Data is 64.40%.The overall microbial annotation showed that cell type organisms accounted for 37.35%,of which bacteria accounted for 8.94%,eukaryota for20.45%,and annotated virus content rarely accounted for only 1.93%.The virus content are different in different shellfish,and the viral genome structure is also not alike.These differences may be mainly related to the source of the samples.The high abundance viruses were annotated are Podoviridae,Microviridae,Herpesviridae,and Circoviridae.The results showed that all samples contained large amounts of uncultured marine viruses,uncultured mediterranean phage uv MED and a small amount of environmental viruses.There were differences in virus types and abundances in different shellfish samples,and the function of the virus metagenome(encoding gene function)was different among different shellfish species.Samples DYL,HJ,HL,JL,ML,XH,ZHB detected more Malacoherpesviridae.The percentage of viral reads in the Tegillarca granosa samples(XH group)reached 15.69%,of which 96.1% of the reads were herpesvirus,which is the first time that oyster herpesvirus(Os HV)has been detected in Scapharca subcrenata. |