Font Size: a A A

Integrated Sequence Assembly Based Approach For Calling Genomic Long Insertion

Posted on:2018-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:L YeFull Text:PDF
GTID:2428330551957985Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technology,many structural mutation detection methods using high-throughput sequencing data have emerged.Due to the limitations of high-throughput sequencing itself,such as short segment and sequencing error,conventional detection methods have large limitations,and the accuracy and sensitivity of detection are not enough.Aiming at this problem,this paper proposes an integrated detection method named ISALins of genomic insertion mutation based on sequence assembly.The main contents of this paper are as follows:(1)Designing genomic insertion mutation detection process,and analysing the experimental procedures used in the test flow.In view of the fact that the number of insertions released by thousand human genomes is too small,we program to generate experiment standard set of insertion mutants.In order to fully verify the results of experiment,NA12878 individual real sequencing data and mutation benchmark data were prepared according to the experimental requirements.The data of simulation and the real data laid the data foundation for the subject.(2)The feature of insertion mutation is analyzed,and a clustering algorithm is proposed to cluster reads which support the insertion mutation,and make sure the validity of subsequent sequence assembly and mutation detection.At the same time,an effective strategy for solving repetitive sequences based on De Brujin graph sequence assembly algorithm is proposed.(3)An insertion mutation integrated detection strategy based on sequence assembly is proposed,and the experiment is carried out on the simulation data and the real data respectively.The implementation of this strategy is divided into four stages:the first stage,in order to ensure the detection sensitivity of the case and improve the long insertion mutation detection accuracy,multiple tools is merged to obtain a result of the initial insertion suspicious breakpoint set;In the second stage,clustering the OEA fragment near each suspect breakpoint and a high-quality soft-clipped read is obtained by analyzing soft-clipped read;the third stage,local assembly is performed using the method based on the De Brujin diagram,by making the dynamic k-mer and k-mer frequency analysis strategies to eliminate the problem of the wrong assembly caused by genome repetitive sequence.The fourth stage,the insertion mutation is detected by mapping contigs to reference genome using bwa and blat.The experimental results show that the proposed method has good effect on the detection of high coverage and low coverage sequencing data,and improves the accuracy of structural variation detection to a certain extent,compared with the traditional method of insertion mutation detection...
Keywords/Search Tags:next-generation sequencing(NGS), insertion, sequence assembly
PDF Full Text Request
Related items