Font Size: a A A

Research On Key Algorithms Of SNP And Indel Variantion Based On Third Generation Sequencing Technology

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:X Q LiaoFull Text:PDF
GTID:2370330611998201Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the improvement of living standards,people begin study genes.Among them,variant is an important cause of human diseases,and research on variant can promote the development of basic biology and medicine.Compared with structural variation of genomes in large regions,SNP and Indel variants,which are smaller,are more difficult to detect.As time goes by,the technology of detecting variantshas also been iteratively updated,and the detection effect has been continuously improved.From the first-generation sequencing technology to the current generation of thirdgeneration sequencing technology,predecessors spent a lot of effort and researched many methods to make outstanding contributions to human health.In order to analyze and study the third-generation sequencing technology sequence,this paper investigates and studies the SNP and Indel variant detection algorithms based on the third-generation sequencing technology,and proposes a set of detection variant process.It is roughly divided into three steps: First is to perform feature extraction from data and find possible variant sites and their variant information.Then locate the possible variant sites,taking a 200 bp sequence fragment as the center to perform multiple sequence alignment to obtain consensus sequence.The third is to calculate the genotype probability of variation using Bayesian statistical methods.The most important method is the multiple sequence alignment method.In the original multi-sequence alignment method,the data has a lot of redundant information,and is based on the pairwise alignment method,which consumes both time and space.In this paper,a dynamic programming algorithm based on partial order graph structure is used.The partial order graph structure discards redundant information and compresses data without losing information.The dynamic programming algorithm makes the originally exponential-level time complexity be polynomial-level time complexity.And the algorithm uses single instruction multiple data modules,this method can quickly generate high-quality consensus sequences.The Bayesian model is used to calculate the genotype probability of mutation,which is divided into homozygous variant,heterozygous mvariant utation and nonvariant.Then compare the variant detection results with the genetic standards in the public database,and calculate the precision,sensitivity and F1 score of the results.In this model,the F1 score is 99.65% when detecting SNP variant,and the F1 score is 97.68% when detecting Indel variant.
Keywords/Search Tags:SNP, Insertion, Deletion, MSA
PDF Full Text Request
Related items