Font Size: a A A

Performance Evaluation Of Methylation Mapping Tools And Development Of Bioinformatics Pipeline For Methylated Multiple PCR Targeted Sequencing Data

Posted on:2024-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:J J LinFull Text:PDF
GTID:2530307076486014Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
BACKGROUND: DNA methylation is a major epigenetic modification in biology,and it plays an important role in embryonic development,genetic imprinting,X chromosome inactivation,and tumorigenesis.In recent years,with the rapid development of next-generation sequencing technology,the detection rate of methylation sites has been continuously improving.However,whole-genome bisulfite sequencing or reduced representation bisulfite sequencing are costly sequencing strategies for large samples with small target regions.In comparison,methylated multiple PCR targeted sequencing is a low-cost,easy-to-operate,highly efficient enrichment,and DNA-saving sequencing technology,which has broad application prospects.However,it lacks dedicated mapping tools and analysis processes for methylated multiple PCR targeted sequencing data.Therefore,the performance evaluation of mainstream methylation mapping tools and the establishment of relevant pipelines in methylated multiple PCR targeted sequencing data are still to be studied.OBJECTIVE: This article aims to evaluate the performance of mainstream methylation mapping tools on two methylated multiple PCR targeted sequencing datasets using mouse.The optimal tools will be selected to establish and optimize the pipeline and create a reporting system.METHODS: Firstly,our laboratory used high-performance computers to store and analyze two experimental datasets,as well as a simulated dataset model generated by Sherman.Secondly,we selected Bismarkbwt1,Bismarkbwt2,BS-Seeker2bwt1,BS-Seeker2bwt2,BSMAP,BWAmeth,ERNE-BS5,GEM3,and Segemehl,evaluated their performance on processing methylated multiple PCR targeted sequencing data,including average CPU running time,average maximum memory usage,average mapping rate,F1-score,average mapping speed,mapping failure rate,differential methylation site and bisulfite conversion rate and sequencing error rate on its corresponding mapping rate.Thirdly,the pipeline for processing methylated multiple PCR targeted sequencing data includes preliminary quality assessment of sequencing reads,quality control,building modified reference sequence index,reads mapping,filtering and correction of SAM file information,extracting methylated sites,text format conversion,methylation levels visualization,and identification and annotation of differentially methylated sites.Fourth,we conducted further data mining based on the generated data and developed and optimized a reporting system.RESULTS: Firstly,the performance of each mapping method was determined through a scoring system and comprehensive evaluation.The results showed that the top three methods are Bismarkbwt2(8.098 points),BWA-meth(7.846 points),and Bismarkbwt1(7.840 points).The F1-score of these three methods were 1,and they performed the best in terms of mapping rate under different bisulfite conversion rates and sequencing error rates.Additionally,Bismarkbwt2 corresponded to the most differentially methylated sites and the lowest mapping failure rate,and performed well in both the average maximum memory and average mapping rate.Secondly,Bismark under the Bowtie2 mode was used as the mapping tools for the pipeline,and the pipeline was demonstrated based on experimental dataset B,with a total running time of 3 h 20 mins.The pipeline was optimized during the building process,including optimization of step judgment,tools running parameters,text processing code,and visualization code.Thirdly,the established reporting system includes bisulfite conversion matrix arranged by samples,the ratios of unique reads matrix arranged by samples and the depth value matrix of each sample arranged by reference ID.In addition,the running time of the reporting system was also optimized.CONCLUSION: After a comprehensive and systematic evaluation,this paper determined to use Bismark with the Bowtie2 mode as the mapping tools in the pipeline of processing methylated multiple PCR targeted sequencing data.This pipeline includes two main parts: data analysis and reporting system.The analysis process can uncover information on various types of methylated motifs and has advantages such as fast analysis speed,simple intermediate operations,convenient error traceback,and highly readable data reporting.It is suitable for studies on large samples with small target regions.
Keywords/Search Tags:methylated multiple PCR targeted sequencing, Bioinformatics analysis, mapping methods, Building pipeline, Reporting system
PDF Full Text Request
Related items