Font Size: a A A

Compression Of DNA Sequences Based On Reference Sequences And Weighting Of Context Models

Posted on:2018-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:R S WangFull Text:PDF
GTID:2370330518954924Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As we gain further insight into the characteristics of DNA sequences,there is a pressing need of efficient compression.According to the high similarity of DNA sequences among homologous species,GReEn,using reference sequence to build probabilistic copy models and arithmetic encoder to encode DNA,has got a significant performance.But the compression performance has a sharp decline when the target sequence is different from the reference sequence.This paper use weighted Context models to solve this problem.First of all,we build a Hash table and use LinkList to store each k-mers string in the reference sequence,do the same things to the target sequence and compared it with the reference sequence.Then using the weighted Context models to encode the places which are different from the reference sequence.Considering Minh.D.C.theory:there is a positive relationship between the weights of Context models and the reciprocal log of the description length,we propose a multi group weighted context models to reduce the code length.We sort and count the description length of each model then calculate logarithm and derivation of the statistics.Finally update the weight with the statistical characteristic of description length.The experimental results show that we can improve the compression efficiency by using the weighting of Context models when the target sequence is different from the reference sequence.It also prove the way that based on the reference sequence and the weighting of Context models,we can improve the compression efficiency in the process of DNA compression.
Keywords/Search Tags:DNA compression, Context weighted, reference sequence, description length, arithmetic encoder
PDF Full Text Request
Related items