Font Size: a A A

Methodology Study On Detection Of Indels From Next-generation Sequencing Data

Posted on:2020-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Y XuFull Text:PDF
GTID:2370330602952347Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Insertion mutation and deletion mutation(indel)is a common form of variation in the human genome.Accurate detection of location and size of indels is crucial for disease prediction.With the development of the next-generation sequencing technology,more and more algorithms for detecting insertion and deletion mutations are gradually being proposed.Although multiple signatures from short reads have been integrated by the methods for improving performance,most current algorithms can only detect such indels less than 50 bp.Due to the characteristics of the new-generation sequencing data itself and the repetitive regions existing in the inserted fragments,indel detection of medium length and large length(50bp-10000bp)still has considerable challenges.Because the next-generation sequence data is made up of a large number of short reads of 100bp-300 bp,the existence of insertion mutation and deletion mutation in sequence will lead to mapping difficulty,when the inserted mutation fragment has repeated regions,splicing error will be caused in the process of sequence splicing.The main work of this paper is to study how to accurately detect insertion and deletion variations of medium length and large length.Aiming at the problem of medium-length and large-length indels detection,we propose a new method,VRindel,which can detect indels of any length and also has good detection performance for the inserted variant genotypes.When detecting the insertion mutation,VRindel can accurately determine the occurrence site of the insertion mutation based on the alignment state of the split reads.On the basis of this,according to the left maximum matching strategy,VRindel uses unmatched reads and split reads to dynamically expand each mutation site to form a virtual reference sequence,and the insertion variation of any size can be detected by comparing the similarities and differences between the virtual reference sequence and the original reference sequence.At the same time,VRindel converts the detection of inserted variant genotypes into the detection of copy number states.Based on a statistical model,analyzing the coverage information of each site in the virtual reference sequence can detect the copy number states of each region,and then achieve the purpose of detecting inserted variant genotypes.In the detection of deletion mutations,VRindel can determine the interval of deletion mutations based on hierarchical clustering algorithm,extract the split read within each interval and conduct split alignment to determine the exact location and size of deletion mutations.In order to verify the detection performance of VRindel for indels,we performed experiments on simulation data and real data,and compared the experimental results with the other eight different methods on the same data.The simulation results show that VRindel has better detection sensitivity and accuracy compared with other eight methods.The results obtained from the real data are also highly consistent with the results of other methods.At the same time,in order to verify the detection performance of VRindel for inserted variant genotypes,we compared its experimental results with other four methods,and the results show that VRindel has relatively good recognition performance.
Keywords/Search Tags:Next-generation Sequencing Data, Insertion Mutation, Deletion Mutation, Cluster, Dynamic extensions, Virtual reference sequence
PDF Full Text Request
Related items