Font Size: a A A

Detection Of Copy Number Variation Based On The Assembly Of Ngs And 3gs Data

Posted on:2020-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:F GaoFull Text:PDF
GTID:2370330602961438Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Next-generation sequencing(NGS)technology is a widely used sequencing method at present.It is widely used in genome research such as variation detection and sequence assembly because of its high accuracy and low cost.Since the emergence of third-generation sequencing(3GS)technology,3GS data has attracted much attention because of its long read over the NGS data.It is regarded as an alternative method to NGS.However,due to the low accuracy of 3GS data and the high cost of sequencing process,there are conflicts between data quality and cost in practical applications,which lead to 3GS data is a complementary to the NGS data research.At present,there are many strategies to detect genome structural variations.However,copy number variation has many subtypes and long variation length,there are many problems in using only one mutation detection strategy for variation detection.Considering the current situation of NGS and 3GS in variation detection,this paper proposes a copy number variation detection method based on joint assembly of NGS data and 3GS data.At the same time,the accuracy and sensitivity of variation detection are improved by using read depth model and deep learning framework.This paper mainly introduces the work from the following aspects.1.Introduce four strategies for structure variation detection.Several popular assembly strategies based on De Bruijn graph algorithm and OLC algorithm are analyzed.The real sequencing data set and benchmark variation data set provided by the 1000 Genomes project are studied,and the existing problems are analyzed.According to the research needs,3GS data were introduced into the whole research.Using SNP and InDel detection methods,the 3GS data was corrected by using NGS data,which effectively balanced the relationship between sequencing cost and correction error,and preliminary experimental data were obtained.2.In order to better detect subtype deletion and duplication of copy number variation,a detection algorithm based on joint assembly of NGS data and 3GS data is proposed in this paper.In this paper,De Bruijn graph algorithm and OLC algorithm are used to make twice assembly,and data contigs for copy number variation detection are obtained.In the whole process of assembly,third-generation data play an important role in solving various closed-loop problems caused by repetitive sequences,ensuring the accuracy of assembly results.3.Combining the read depth strategy,we use the deep learning model to detect the copy number variation.The contigs we obtained were mapped to the reference genome,and the read depth of each site was analyzed.According to the relationship between the copy number and LRR,pictures containing the number of copies were generated for deep learning model training.The proposed copy number variation detection method,AssCNV23,has carried out variation detection experiments on both simulated data and real data.It is found that the accuracy,sensitivity and breakpoint accuracy of both high and low coverage data in the simulated data and real data are better than those of current detection tools.
Keywords/Search Tags:copy number variation, joint assembly, read depth, deep learning
PDF Full Text Request
Related items