Font Size: a A A

Research And Optimization On Sequence Alignment Algorithms For DNA Methylation Sequences

Posted on:2022-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiuFull Text:PDF
GTID:2518306323962389Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
DNA methylation is an important way for the diagnosis and treatment of many diseases.The most important step for its detection is to determine the mapping posi-tions of the DNA methylation sequences in the reference genome.Next,we identify which Cs in the reference genome are methylated.The bisulfite sequencing technology turns the unmethylated nucleotide C into the nucleotide T,which reduces the informa-tion complexity of the sequences and significantly increases the probability of mapping to multiple positions.The Guide Position Sequencing technology generates paired-end sequences,which are conventional sequences and methylated sequences.The existing alignment algorithm uses a dynamic programming method to determine the mapping positions of large-scale methylation sequences.This process consumes a lot of time.Therefore,in the field of bioinformatics,how to align the methylation sequencing se-quences of the above two types to the reference genome accurately and quickly is of significant importance.The thesis focuses on the alignment position's determination for bisulfite sequencing data and the optimization of alignment algorithm for Guide Position Sequencing data.The main research content and contributions are as follows:1.Research on the alignment position's determination for bisulfite sequencing dataWe align the bisulfite-treated sequences to the reference genome,and sequences aligned to multiple positions are called multireads.These sequences are discarded in the downstream analysis,resulting in a waste of information resources.To identify the best mapping position of each multiread,existing Bayesian-based methods calculate the probability of the multiread at each position by considering how does it overlap with unique mapped reads.However,a large portion of multireads does not overlap with any unique reads,and existing methods cannot determine the unique position of these multireads.This thesis proposes a method of alignment position's determination for bisulfite sequencing data.For multireads with overlapped unique reads,a comprehen-sive scoring strategy is used,including sequence similarity,bisulfite treatment,and the probability of sequencing error.The position with the highest score is the best alignment position.For multireads without overlapped unique reads,the best alignment position is determined based on the coverage after all sequences aligned to the reference genome.The experimental results show that compared with the Bayesian method,our method improves the recall by 14%and increases the accuracy from 87.35%to 91.71%.2.Optimization of sequences alignment algorithm for Guide Position Se-quencing dataIn the existing alignment process for Guide Position Sequencing data,the mapping positions of conventional sequences in the reference genome are determined.Then,ac-cording to the positional relationship between the paired-end sequences,the alignment range of the methylation sequences on the reference genome is determined.Finally,the dynamic programming algorithm is applied to determine the optimal alignment position of methylation sequences,which is the position with the highest local similarity.The process consumes too much time,and the proportion of mapping to a unique position is not high.This thesis proposed an improved algorithm,which directly aligning the methylation sequences to the reference genome.If methylation sequences are mapped to the unique position,no subsequent processing will be carried out.If methylation sequences are mapped to multiple positions,the optimal alignment position can be de-termined by the positional relationship between the paired-end sequences.Due to the high accuracy and fast speed of the existing alignment algorithms for methylation se-quences,directly aligning the methylation sequence to the reference genome reduces the workload.The experimental results on the real data set and the simulated data set show that the accuracy of our method is not significantly different from that of the ex-isting method.The proportion of mapping to a unique position is higher and the time performance is improved about 3 times.
Keywords/Search Tags:DNA Methylation, Sequence Alignment, Bisulfite Sequencing, Guide Positioning Sequencing
PDF Full Text Request
Related items