Font Size: a A A

The Study Of The Factors Affecting The Quality Of Whole Genome Sequencing Data -AT Split

Posted on:2021-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ChenFull Text:PDF
GTID:2370330611465917Subject:Biological engineering
Abstract/Summary:PDF Full Text Request
Whole Genome Sequencing(WGS)is a routine technique for genome sequencing and analysis of individuals with unknown genomic sequences.Genomic information is vital for identifying genetic diseases,characterizing mutations that drive cancer progression,and tracking disease outbreaks.Research on human genetics and population-based chemistry based on whole-genome resequencing can help to analyze genomic diversity and genetic evolution,select disease-causing and susceptibility genes and other variant genes.At present,whole-genome sequencing mainly bases on next-generation sequencers,and each step like sample extraction,database construction,and sequencing will affect the quality and productivity of sequencing data,and the quality of sequencing data will affect the results of downstream information analysis,so obtaining high-quality data is a comprehensive and correct premise for biological information analysis.AT split is key to quality monitoring of sequencing data.The results of AT split may affect the variation detection and CNV analysis of downstream analysis.At the same time,it may also cause low coverage of some high GC region,reducing the accuracy of results of high GC areas.The accuracy of mutation analysis may not lead to correct microsatellite site information.Although the large or small AT split has little effect on the bioinformatics analysis,in order to reduce this effect to a minimum,we analyzed and optimized the influencing factors of this index to ensure better accuracy of sequencing data Sex and reliability.In this research,the sequencer(BGISEQ-500,DIPSEQ)independently developed by BGI was used to sequence the WGS library,and the sequencing data was analyzed.Four factors affecting AT split of sequencing data were found and optimized through experiments in order to reduce the AT split and improve the data quality.The result is as follows.(1)The concentration of DNB(DNA nanoball)has an effect on the AT split of sequencing data.The higher the DNB concentration is,the greater the AT split reaches.When the DNB concentration is 12-20 ng/?L,the quality of the sequencing data is better,and the AT split is relatively slight;when the DNB concentration is less than 8 ng/?L,the quality of the sequencing data is worse;(2)The library construction method has an impact on the AT split of sequencing data.By optimizing the library construction process via adding a digestion reaction after library construction and quantifying;at the same time optimizing the sequencing process in the DNB preparation process,the library input is reduced by 15 ?L was changed to 6 ng.After the process was optimized,the sequencing results of 1421 samples were analyzed,and it was found that the process optimization can effectively reduce the AT split of sequencing data;(3)The input of ss DNA library has an impact on the AT split of sequencing data.6 ng and 4 ng ss DNA was taken as input in DNB preparation.On the BGISEQ-500 sequencing platform,the sequencing results of 3820 samples were compared.It was found that it can effectively reduce the AT split of sequencing data,but on the DIPSEQ sequencing platform,the sequencing results of 65 samples were compared,and the results were not obvious;(4)Enzyme X has an effect on the AT split of sequencing data.By optimizing the experimental procedure,the Enzyme X concentration was set to 75 ng/?L.On the DIPSEQ sequencing platform,79 samples were analyzed.Sequencing results showed this optimation can effectively reduce the AT split of sequencing dataIn short,this study optimized the experimental process.After the experimental tests,it was found that the AT split of the whole genome sequencing data can be effectively reduced,and the method was applied to production.The AT split of the final sequencing data was effectively reduced,which greatly improves the quality of whole genome data and provides a solid foundation for subsequent data analysis.
Keywords/Search Tags:WGS, AT Split, DNB concentration, Library concentration, Library input, Enzyme X
PDF Full Text Request
Related items