Font Size: a A A

Parallel Optimization On Compression Algorithm For Genomic Data

Posted on:2020-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:B X KeFull Text:PDF
GTID:2370330620458510Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of next generation sequencing technology and its widespread application in various fields like drug development and disease diagnosis,sequencing data has grown exponentially.Effective compression algorithms are needed to reduce the size of huge volumes of sequencing data,which can reduce the cost of storage and transmission.To meet this demand,researchers proposed a variety of specialized compressors for genomics data.Although these methods effectively improve the compression ratio of sequencing data,they suffer from low compress speed,which is quite important for practical applications.At the same time,features like multi-core and vector instructions are gaining in popularity with the development of modern hardwares,and the usability of accelerators such as GPU become higher than ever before.Thus,we can parallelize compression algorithms for genomics data with modern hardwares to make them acceptable for practical applications.Based on the background information above,we conducted a research using modern hardwares to parallelize LCQS,a specialized compression algorithm for quality scores of genomics data,and proposed two optimization schemes in multi-core CPU and CPU-GPU heterogeneous environment respectively.In multi-core environment,this thesis implemented the parallel LCQS with multi-thread and shared queue based on data parallel and pipeline.A light-weight index structure was introduced to support fast random access decompression for downstream applications.In addition,vector instructions were used for the fine grained parallelization of PAQ compressor,a timeconsuming key module.The vectorized PAQ was portable and could be integrated to accelerate compressors including LCQS.In conclusion,while maintaining a high compression ratio,both the compression speed and random access decompression speed of parallel LCQS were faster than general purpose compressors and specialized compressors of the same category.The optimization scheme achieved a high speedup and good scalability.In CPU-GPU heterogeneous environment,the compute-intensive encoding stage was migrated to GPU.Data structures and parameters of the algorithm on the transplanting version were improved according to the characteristics of the algorithm and the GPU hardware.Tunning methods like loop unrolling and cache configuration optimization were conducted to improve the memory throughput and to reduce the memory latency.Experiment showed that the tunning methods greatly improved the speed of encoding process running on the GPU,and the accelerating GPU encoding method contributed significantly to the overall performance of the algorithm.
Keywords/Search Tags:Data compression, Parallel optimization, Multicore processor, Heterogeneous computing, Single Instruction Multiple Data
PDF Full Text Request
Related items