| With the development of computer technology,the data that the computer system needs to process is increasing,and data compression has become an important technology to solve this problem.However,most of the current compression algorithms can only be used on the CPU,and cannot fully utilize the parallel computing of the GPU.The parallelization of data compression algorithms is progressing slowly,some algorithms are still single-threaded,and the GPU compression algorithms used in practice are even more scarce.Therefore,a parallel compression algorithm that can be implemented on NVIDIA GPU is proposed in this paper.For text files,the algorithm not only ensures a good compression rate,but also achieves a higher compression speed.On the enwik8 dataset,the algorithm achieves a compression rate of 0.241 and a compression speed of 140MB/s.Compared with common compression software with similar compression rate,the compression speed of this dataset is 36 times that of 7z and 3.7 times that of rar.In order to achieve the above performance,the following researches are mainly carried out in this paper:(1)Several methods are explored in this paper to improve the compression ratio with limited computation.In order to refine the order of the context,decision tree modeling is used for N-order contexts and obtain better results than using N-order contexts alone.Then,the probability given by the N-order context and the decision tree is weighted and fused,and the stochastic gradient descent method is used for training to further optimize the compression rate.According to the characteristics of context with different sampling times and information entropy,the weighted fusion method using stochastic gradient descent is improved.A two-dimensional matrix is generated using the number of samplings and information entropy,and a better compression ratio is obtained by the above strategy.Through the above optimization method,the compression ratio on the enwik8 dataset exceeds the commonly used compression formats.(2)The compression algorithm is implemented on the GPU,and the compression is done by BWT transform and 0-order entropy coding.In order to improve the speed of comparing strings,the BWT transformation is performed using the Manber-Myers multiplication algorithm.In order to reduce the impact of reordering a small amount of data multiple times,the strategy of using merge sort to sort these data is adopted to improve the original algorithm,and the speed of suffix array sorting is 2.9 times that of the original.Then,the range encoding on the GPU is implemented,and the decoding speed is optimized by binary search,and the speed of encoding and decoding is optimized by reducing the number of memory accesses.In order to solve the problem that the shared memory of each thread is less than the number of characters,a limited code table is used to save the occupancy of shared memory to improve the speed of statistics.In order to weight the historical information of different lengths,an exponential function is used to update the statistical results,which improves the compression rate.Statistics and speed of the codec were tested,the speed of statistics and encoding reaches 2.38GB/s,and the speed of statistics and decoding reaches 1.20GB/s. |