Font Size: a A A

Research And Realization Of The Algorithm Of Similar Genome Alignment Based On MECAT

Posted on:2021-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HeFull Text:PDF
GTID:2370330614971118Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The significance of gene sequencing is to enable humans to fundamentally understand the causes of diseases,to properly treat diseases,and to prevent diseases as soon as possible.For example,diseases such as tumors and lupus erythematosus are related to genetic mutations.If you can know the mutation point through sequencing technology,it will be of great significance for accurate treatment and to overcome difficult problems.The third generation sequencing technology is Pac Bio's SMRT technology and Oxford Nanopore Technologies' nanopore single molecule sequencing technology.The sequence length measured by sequencing technology is 10kbp-15 kbp,the sequencing cost is low,and the average sequence error is 15%,but the sequence is not biased error,which can expand the application field.We call the sequence obtained by the third generation sequencing long read.The MECAT algorithm is a comparison method for three-generation sequencing technology.This method can quickly compare long reads to the genome,but the number of bases in the comparison results is low and the coverage is low.In this paper,two optimization methods are proposed for the MECAT algorithm.The author uses two characteristics of the long reads which are the difference between similar reference genome and long reads and the long reads are from the unique position on the genome to optimize.Algorithm optimization is mainly divided into two modules:(1)Based on the difference between similar reference genome and long read,the concept of similarity is proposed.First we divide the block on the genome.Calculate the similarity between the region and the long read on the genome,add the calculated similarity to the calculation candidate comparison part of MECAT,thereby changing the position of the candidate center of the alignment,and promote more long read to better match the genome.(2)The author uses the comparison results that have appeared to perform redundant result filtering.In the comparison results,there will be a case where a long read is compared to multiple regions of a similar genome.The distance ratio is calculated based on the distance between the multiple read regions on the genome.In the current data experiment display,the number of E.coli base alignments for the MECAT algorithm is increased by 4%-8%,and the coverage is increased by 9%-12%.The number of Yeast base alignments increased by 19%-130%,and coverage increased by about 5%.The number of base alignments in A.thaliana is increased by 22%-25%,and coverage is increased by 20%-30%.Multiple sets of experiments prove that we have achieved a good comparison effect on the optimization of the MECAT comparison algorithm.
Keywords/Search Tags:Alignment, Long reads, Similar genome
PDF Full Text Request
Related items