Font Size: a A A

Multi-dimensional Analysis Of Single-cell Whole-genome Data Based On A Large Number Of Samples

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y H TaoFull Text:PDF
GTID:2480306476460444Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
After nearly 20 years of development,high-throughput sequencing technology has become one of the important tools for studying the whole human genome.With the development of high-throughput technology,researchers are increasingly concerned about the information of the genome of a single cell,rather than the average information of the cell population.Single-cell sequencing has emerged.At present,single cell sequencing has been widely used in the evolution of tumor mutations,genetic recombination of germ cells,somatic mutations and other biological and genetic issues.Existing single-cell studies usually focus on copy number variation.However,because a single cell contains only 2 copies of genomic DNA,the process of library construction and sequencing is greatly affected by operating and experimental conditions,and some DNA fragments may be lost,and different genome-wide amplification techniques are used for pre-expansion There are different preferences for Zengshi.Therefore,by comprehensively comparing and analyzing the existing single-cell whole genome sequencing data,determining the standard and CNV mode suitable for single-cell data analysis has guiding significance for single-cell sequencing.This subject has studied the data of single-cell genome,and designed a series of bioinformatics analysis based on the variable window copy number variation detection process.In different dimensions,the sequencing depth and window size were evaluated and compared respectively.The main research contents and results are as follows:1.In copy number variation analysis,sequencing depth will affect the depth of window coverage,and window size will affect the resolution of CNV.Therefore,we investigated and collected a large amount of single-cell sequencing research data,and designed a set of bioinformatics covering coverage analysis,amplification uniformity analysis,coverage depth correlation analysis,copy number accuracy evaluation,and cluster analysis Process,comprehensively evaluate the impact of amplification methods,sequencing depth,window size,sample source,etc.on the analysis results.2.In order to comprehensively evaluate the effect of sequencing depth and window size on the results,we selected 20 high-depth(higher than 5 ×)single-cell sequencing data from 5 studies for analysis,with a benchmark sequencing depth of 5 ×.The results show that as the sequencing depth increases,the correlation coefficient between the coverage depth of the data window and the benchmark continues to increase.In the 50 k window,when the sequencing depth reaches 0.75 × and above,the correlation coefficient is higher than 0.95,and the accuracy of copy number is high At 0.9.At the same time,as the window size increases,the correlation coefficient between the coverage depth of the data window and the benchmark also continues to increase.When the window size is> = 250 k,the correlation coefficient above0.95 and the copy number of 0.95 can also be obtained when the sequencing data is as low as 0.1×.accuracy.Therefore,single-cell sequencing can achieve high-precision data(window size 50k)when the depth of sequencing reaches more than 0.75×.When the window size is set to 250 k or more,high-depth data can be restored at low sequencing depth(0.1×)As a result,the researcher can choose the appropriate depth and window size according to the analysis accuracy requirements and the specific conditions of the experimental conditions.3.In order to evaluate the impact of whole-genome amplification methods on single-cell CNV analysis,we selected 101 higher sequencing depth(above 1 ×)single-cell sequencing data covering DOP-PCR,MDA and MALBAC amplification methods for comparison And analysis,the benchmark sequencing depth is 1 ×.The results showed that DOP-PCR had the lowest coverage,MDA had the highest coverage at the same sequencing depth,DOP-PCR had the best amplification consistency,and MDA had the worst amplification consistency.In the 175 k window,DOP-PCR only needs 0.1× data to reach the correlation coefficient of 0.95 with the benchmark,while MDA and MALBAC require 0.2× and0.6× respectively;in the 175 k window,DOP-PCR still only needs 0.1× That is to say,the accuracy of the copy number between the reference and the benchmark can reach more than 95%,while MDA and MALBAC require 0.2× and 0.4×.In general,the three amplification methods can accurately restore the results of 1× sequencing data in a 175 k window.Among them,DOP-PCR requires the least amount of data and MALBAC requires the most.4.In order to explore the impact of samples on CNV detection,we divided single cells into four categories according to cell type: somatic cells,germ cells,nerve cells,and tumor cells.The benchmark sequencing depth was set to 1×.The results show that the abnormal copy number of tumor cells is higher,and the accuracy of copy number is much lower than that of the other three types of single cells.This shows that the analysis of single cells with complex CNV patterns requires a higher sequencing depth.After performing dimensionality reduction analysis on all single cells,it was found that single cells with simple CNV patterns,such as ordinary diploid cells,still have high similarity with each other even if they use different whole genome amplification methods.Through the statistical analysis of the abnormal copy number events of all tumor cells,we found that there are some high-frequency areas in the genome where copy number abnormalities occur,which has certain indication significance for the relationship between the genes in the study area and tumor development.This subject has studied the data of single-cell genomes,and has provided guiding results for the selection of sequencing depth and window size under different experimental conditions,sample types,amplification methods,and analytical precision requirements,and found some in the tumor cell genome.The high-frequency region where copy number abnormality occurs has important value for the optimization of single-cell sequencing process and parameter selection,and has certain indication significance for the study of tumor development based on CNV.
Keywords/Search Tags:Single Cell Sequencing, Copy Number Variation, Whole Genome Amplification
PDF Full Text Request
Related items