Font Size: a A A

Individual Gene Variants Detection Algorithm Based On Population Genomic Information

Posted on:2021-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z B WuFull Text:PDF
GTID:2404330611498168Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The human genome data contains important information about human evolution,genetics and diseases.Since the emergence of modern human life science,the study of genomic data and genetic variation has been a hot issue in academia.The development of gene sequencing technology has influenced the design of gene mutation detection algorithms.With the rapid development and wide application of high-throughput sequencing technology,more and more gene mutation detection algorithms based on high-throughput sequencing data have been proposed.However,due to the high reproducibility of genomic data and the limitations of high-throughput sequencing technology,the genetic variation detection technology still has great challenges.As genome sequencing and genomic variation detection and analysis play an increasingly important role in humans facing diseases and other problems,the detection of the types of mutations in the genome has become a hot issue in bioinformatics research.In order to propose a gene mutation detection algorithm for high-throughput sequencing technology,this paper investigates the current development of genome sequencing technology,the main idea of genome splicing algorithm and the status of genome mutation detection technology.This paper proposes an individual genetic variation detection algorithm based on population genomic information to detect the type of variation present in individual genomes.The main research work of this article is as follows:(1)Identify regions of variation in unknown mutations in genomic data.In the genomic data,the variable window area is identified by a sliding window mechanism,so as to determine the type of gene mutation among the variable area.Divide the genomic data into multiple connected sliding windows,and ensure the coverage of the reads in the sliding window.Obtain the sliding window variation ratio curve by calculating the proportion of the variation position in each sliding window,and then obtain the variation area,using known variation information Obtain the mutation region of unknown mutation.(2)Design a local assembly algorithm of the genome in the mutation region of unknown mutation.Through the analysis and modeling of the genome assembly problem,the genome problem is transformed into a specific string sequence problem.And propose a genome assembly algorithm based on unikmer,which uses the uniqueness of the unikmer in the genome and the certainty of the position,and compares it with the reads to determine the relative position of the reads on the reference genome,so that the judgment between reads Relative positional relationship is easier.(3)Gene variation information detection.In the process of genetic variation detection,the contigs collection is obtained by judging the variation region and realizing the genomic data assembly in the variation region,and the genomic data variation information is obtained by comparing the contigs collection with the reference genome.This paper proposes an individual gene variation detection algorithm based on population genomic information.The algorithm uses a sliding window mechanism to detect regions of unknown variation and uses unikmer for local genome assembly.The use of space,this algorithm is of guiding significance for mutation detection and analysis of genomic data obtained by high-throughput sequencing technology.
Keywords/Search Tags:Genomic variation, Second-generation genome sequencing, Genome partial assembly
PDF Full Text Request
Related items