Font Size: a A A

Copy Number Variants Calling And Noise Analysis For Single Cell Sequencing Data

Posted on:2018-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:C S ZhangFull Text:PDF
GTID:2310330536978207Subject:Engineering
Abstract/Summary:PDF Full Text Request
Copy number variation caused by the genome rearrangement,generally refers to the copy numbers increased or decreased of large genome segments whose lengths are more than 1kb.Such copy number variations mainly appeared as the sub-microscopic level of deletion and duplication.Copy number variation is an important component of genome structural variation,and its variation rate is much higher than SNP.It is one of the importantly pathogenic factors of human disease.The traditional techniques of copy number variation detection,including comparative genomic hybridization and fluorescence in situ hybridization,are limited in practical application because of high cost,low throughput and limited resolution.Since reported in 2005,the high-throughput sequencing technology has been widely used in various fields of life science research,such as de-novo sequencing,RNA sequencing,epigenetics and others.It possesses the advantage of high throughput,low cost and high output of genomics information.Before the born of single cell sequencing technology,high-throughput sequencing techniques were mainly used for tissue sequencing of multiple cells.These sequencing results represent the average of multiple cells and ignore the heterogeneity of single cell genome.With the arrival of single cell sequencing technology,it is possible to detect the copy number variation of individual cells.Single cell sequencing,which sequences the DNA or RNA from single-cells,can reveal the gene expression status and genomic variation profile of a single-cell.Thus it can reflect the heterogeneity among cells.Single cell sequencing plays an important role in the study of tumor,developmental biology,neuroscience and other fields.Single cell sequencing consists of the isolation of individual cells,whole genome amplification,sequencing library construction and sequencing.Different from the traditional tissue sequencing,the whole genome amplification process of single cell sequencing will introduce significant amplification bias,low genome coverage and other issues.As a result,this process will make single cell sequencing data have specific noise.In this paper,we first introduce the copy number variation and its detection technologies.Then we will introduce the preprocessing of sequencing data and analyze the noise from single cell sequencing data.Finally,we propose our detection model of copy number variation specific for single cell sequencing data.The model employs the constraints of sparsity and smoothness to fit the copy number patterns under the assumption that the read depth signals of single cell sequencing data obey negative binomial distributions.The detection problem was formulated as a quadratic optimization problem and was solved by an efficient numerical solution based on the classical alternating direction minimization method.The experiment results show that the proposed model performs excellently and stably in the copy number variation calling of single cell sequencing data.We package and deploy the detection method at BGI online for researchers to use.Single cell sequencing is becoming a hotspot in life science research.This paper explores its noise characteristics at the data level and provides an effective model to detect copy number variations,which is expected to promote the development of data analysis for single cell sequencing data.
Keywords/Search Tags:CNVs, negative binomial distribution, quadratic optimization, smoothness, sparsity
PDF Full Text Request
Related items