Font Size: a A A

Research On Insertion/Deletion Variation Detection Method Based On Long-Read Sequencing Data

Posted on:2024-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:L L LiFull Text:PDF
GTID:2530306917461294Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Insertion/deletion variation is a relatively common type of genomic variation,which exists both in small variations(length<50 bp)and structural variations(length ≥50 bp).As an important type of genetic variation in genomic variation,insertion/deletion variations also play an important role in the expression of pathogenicity.Although the variation detection methods based on long-read sequencing data that have emerged in recent years have detected more abundant genomic variations,due to the diploid structure of the human genome and the presence of complex regions,the detection of some complex types of heterozygous insertion/deletion variations still has some difficulties,and there is still room for improvement in the effectiveness of insertion/deletion variation detection.In order to more accurately detect insertion/deletion variations in human diploid genome sequencing data,this paper proposes an insertion/deletion variation detection method based on long-read sequencing data.This method mainly consists of two steps:classifying read segments within a region and detecting insertion/deletion variations from regions.The research content of this article is mainly divided into the following three aspects:(1)Method for classifying read segments within a regionFirstly,collect suspicious genomic regions in binary alignment/Map format(BAM)files.Then,the insertion/deletion variation signatures in the suspicious genomic region are extracted,and the similarity between the variation signature sequences on different read segments is calculated using a read segment alignment algorithm based on pairwise sequence alignment.Finally,according to the similarity of the variation signature sequences and diploid characteristics,the read segments within the suspicious genomic region are divided into two groups as much as possible.(2)Method for detecting insertion/deletion variations from regionsThe insertion/deletion variation detection method divides the sequencing data into at most two sets of read segments within the suspicious genome region.The consensus sequence of the read segment set is constructed using the partial order multiple sequence alignment algorithm,and a read segment smoothing step is added before the partial order alignment to improve the algorithm’s performance and efficiency.After the partial sequence alignment,a filtering step based on sequencing depth is added to further refine the variation breakpoints.(3)Experiment on insertion/deletion variation detection methodThe performance testing of the insertion/deletion variation detection method is conducted on simulated and real datasets of human chromosome 2.Compared with five other genomic variation detection methods based on long-read sequencing data,the experimental results demonstrate that this method has good insertion/deletion variation detection performance.
Keywords/Search Tags:variation detection, pairwise sequence alignment, partial order alignment, classify, long-read sequencing technology
PDF Full Text Request
Related items