A Long Read Hybrid Error Correction Algorithm Based On Segmented PHMM

Posted on:2022-07-16

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Hu

Full Text:PDF

GTID:2480306335496724

Subject:Organic Chemistry

Abstract/Summary:

PDF Full Text Request

Although the next-generation sequencing technology has advantages in throughput and accuracy,it can't cross the repeat region when the data volume is large due to its read length,which makes the downstream analysis difficult.The third generation DNA sequencing technology can produce longer sequences.Although it can make up for some of the weakness of the second generation sequencing,the disadvantage is that the number of wrong bases will also increase,so the correct rate is about 85%.In this case,researchers usually complement the advantages of short reads and long reads,using short reads to correct long reads,so as to improve the accuracy of the sequence without losing the length of the sequence as much as possible.Although the error rate of LR sequence is about 15%,compared with the whole LR sequence,the correct rate of bases still account for a large proportion.If the forwardbackward probability calculation and Viterbi decoding of the whole LR sequence are carried out,not only the running time,but also the calculation redundancy will be caused.Therefore,on the basis of Hercules algorithm,we propose a new hybrid error correction scheme.The matching part in the long reads is not processed,and the uncovered part or the poor matched part is corrected by the p HMM-based algorithm to reduce the running time while maintaining high accuracy.Error correction is mainly divided into two parts,preprocessing based on short reads alignment and Error correction based on p HMM.The advantage of this scheme is to combine p HMM and the aligner based on Hercules,which not only reduces the dependence on the performance of the aligner to a certain extent,but also reduces the running time of using p HMM for global error correction.To evaluate this method,we applied it to E.coli data set and Saccharomyces cerevisiae data set.We compared the experimental results with the uncompressed Hercules error correction stage,and found that while maintaining the accuracy,the running time of E.coli was reduced by 65%,and the running time of Saccharomyces cerevisiae was reduced by 4.7 times.

Keywords/Search Tags:

Bioinformatics, DNA sequence analysis, pHMM, Error correction

PDF Full Text Request

Related items

1	Analysis Of Coding Features Of DNA Sequences Based On Error-Correction Coding Theory
2	Error Correction Of NGS Gene Based On Multiple Sequence Alignment
3	Quantum Logic Gate Sequence And Quantum Error Correction With Continuous Variables
4	Cloud Computation-Based Error Correction For Transcriptome Assembly
5	Research On The Construction And Sequence Splicing Parallel Optimization Method Of The Second And Third Generation Genome Hybrid Assembly Process
6	Sequence Assembly Algorithms For Next-generation Sequencing Technology Research
7	Standard Quantum Error Correction And Operator Quantum Error Correction
8	Error Correction Of Lightning Location And Analysis Of Lightning Activity Characteristics In Shenzhen Area
9	The Genomic Sequence Clone, Bioinformatics And Express Analysis Of Cbr In Dunaliella Salina
10	Algorithmic Study On Long Read Assembly Error Correction Based On Linked Reads Sequencing Data