Font Size: a A A

Genotype Calling And SNP Detection For Single-cell DNA Sequencing Data

Posted on:2019-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:J Y HuangFull Text:PDF
GTID:2370330566986570Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Single nucleotide polymorphism mainly refers to a DNA sequence polymorphisms caused by a single nucleotide variation at the genomic level and plays an import role in human genetic mutations.Traditional high-throughput sequencing technology simultaneously sequences multiple cells,and it ignores the heterogeneity between cells.The final results describe the average of multiple cells.With the introduction of single cell sequencing technology,it is possible to detect single nucleotide variations in individual cells.However,due to factors such as noise and low coverage in single cell sequencing data,it is still challenging that genotypes and single nucleotide polymorphisms are accurately identified.Based on this,this thesis mainly uses the single cell sequencing data as the research object to establish a model for detecting genotypes and single nucleotide polymorphisms.First,this thesis presents an analysis pipeline of single nucleotide polymorphisms in detail.The pipeline consists of two modules to achieve data processing,genotype and single nucleotide polymorphisms detection.The accuracy of single nucleotide polymorphism detection is closely related to the sequencing error.The amplification that single cell sequencing data needs in sequencing results in the error.To improve the accuracy of single nucleotide polymorphisms detection,the quality control of data is essential.Second,the thesis analyzes the sequencing error of single cell sequencing data.And a model for identifying genotype and single nucleotide polymorphisms based on the characteristics of single cell sequencing is proposed.The model fits single nucleotide polymorphisms pattern under the assumption that the sequencing error obeys a Gaussian distribution.At the same time,base calling error and mapping error are integrated into the model.The model was solved by dynamic programming.In summary,the main innovations of this thesis are: 1)the error in the analysis pipeline originates from two points,namely base calling error and mapping error.The common methods only take base calling error rate into consideration.This thesis simultaneously incorporates these two error rates into the model;2)the model of identifying genotype and single nucleotide polymorphisms is proposed based on the sequencing error of single cell sequencing data.To assess the experiment results detected,first,bulk sequencing data was used for constructing a set of true SNPs.Then the set was considered as the ground truth,the proposed method compared with the other method from true SNPs,precision,the bias towards transition.The results showed that the true SNPs and precision detected by the proposed method had a certain improvement over the other method,and the bias towards transition became better.Experimental study showed that the method proposed in this thesis can detect more mutation and had a certain research results.
Keywords/Search Tags:single nucleotide polymorphism, genotype, base calling error, mapping error
PDF Full Text Request
Related items