Font Size: a A A

Research On High-throughput Sequencing Of Somatic Gene Mutations To Detect Reference Materials For Bioinformatics Analysis

Posted on:2020-12-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:1360330578483557Subject:Clinical Laboratory Science
Abstract/Summary:PDF Full Text Request
Cancer is one of the major causes of death and a major public health problem in China.With the rapid development of personalized medicine,targeted drug therapy according to their individual tumor mutation information plays an increasingly important role in the treatment of cancer.Due to a large number of gene mutations associated with cancer diagnosis and treatment have been found,the traditional single mutation detection method has been unable to meet the needs.In recent years,next-generation sequencing technology(NGS)makes it possible to detect multiple loci of multiple genes at the same time.However,the analytic phase of NGS differs most from traditional molecular diagnostic methods in that it involves multiple experimental steps and complex bioinformatics analysis.As an integral component of NGS,bioinformatics pipelines to detect genomic mutations have a significant impact on genetic test results.For clinical NGS testing,in order to obtain accurate and reliable detection results,a proper reference dataset is a prerequisite for bioinformatics pipeline developing,validation,and conduct external quality assessment(EQA)programs.Although sequencing data of real-world clinical or synthetic DNA samples can be used as reference datasets for clinical bioinformatics analysis,but it cannot simulate the full range of variant types and variant allele frequencies(VAFs)that are encountered in clinical scenarios and the costs are expensive.The utility of raw sequencing reads-editing based in silico sequence files provides a valuable resource for evaluation of bioinformatics pipelines.Because it is a straightforward,quick,and inexpensive process to introduce a range of sequence variants,in various combinations,and at various VAFs.However,existing variant simulation software,BAMSurgeon,has some limitations.BAMSurgeon cannot simulate some important sub-types of cancer driver structural variants(SVs),such as inter-chromosomal rearrangements.It also cannot add SVs to targeted sequencing data that have been routinely applied in clinical practice.Second,BAMSurgeon does not support the simulation of copy number variations(CNVs)and complex deletion-insertion variants.Third,BAMSurgeon cannot simulate flow signal information in sequencing data from the Ion Torrent system.In this study,we developed VarBen,a tool for variant simulation,to generate user-specific reference datasets based on real sequencing data which emulate the real-world environment of wet laboratory process.To evaluate the reliability and robustness of VarBen,we performed a series of proof-of-principle validation studies.First,we compared the performance of SNV and Indel calling on simulated datasets generated by VarBen,BAMSurgeon,and the curated MB gold set.The results showed that both the SNVs and Indels calling performance of the simulated data is highly comparable for the MB gold set,indicating that there was no bias in the simulated data compared with the real-world data.To exclude the influence of genomic background,aligner,and random split read division,we compared the variant calling performance of difference sequencing data,aligners,and divisions of random read splitting.The results show that our simulated variants were independent of genomic background,aligner and random split read division.We further evaluated the suitability of VarBen for targeted sequencing data.All simulated variants were correctly detected in the two targeted sequencing data generated from the Illumina and Ion Torrent platforms.Overall,these validation studies demonstrated the reliability and robustness of VarBen as an unbiased and powerful calibration tool for somatic variant simulation.To evaluate the proficiencies of somatic variant calling in laboratories utilizing NGS to detect somatic mutations,an EQA for NGS bioinformatics was implemented.In total,we received 113 submissions.This EQA study shows that Indel detection appears to be particularly challenging,with performance lagging behind than those of SNV detection,especially for complex deletion-insertion variants and FLT internal tandem duplication(ITD)variant.In summary,we developed VarBen to generate synthetic reference datasets for benchmarking somatic variant calling pipelines.VarBen has a number of benefits compared with existing variant simulation methods,including the ability to simulate complex deletion-insertion variants,large structural variants(SVs)and CNVs in both whole genome and targeted sequencing data,and the ability to handle sequencing data from a broad range of sequencing platforms,e.g.,Illumina,BGI and Ion Torrent.VarBen retains the characteristics intrinsic to raw sequencing data from physical specimens,such as the distribution of quality scores and depth of coverage,which are better at emulating characteristics from real-world sequencing data.Recognizing the defects is a prerequisite to optimize the analysis pipeline.Thus,to assure a reliable test result,a customized user-specific reference dataset is essential for bioinformatics pipeline developing and validation in clinical NGS testing.
Keywords/Search Tags:worlds Cancer, somatic variant, next-generation sequencing, reference materials, external quality assessment
PDF Full Text Request
Related items