Font Size: a A A

Research On Improving The Efficiency Of Hardware-Based Lossless Data Compression By Using Huffman Encoding

Posted on:2019-04-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q DongFull Text:PDF
GTID:1368330626950325Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,data compression technology can effectively reduce the storage space and transmission bandwidth requirements of various data applications.Data lossless compression technology can reduce the amount of data without reducing the quality of reconstructed information thus has been widely used in the field of computing.Huffman coding,as an entropy coding technology,is widely used in the field of data lossless compression.It is a coding method based on statistical frequency of characters and data structure of binary tree,which builds binary trees “from bottom to top” and assigns a code with the shortest average sentence length to each symbol in data to be encoded.This method guarantees excellent data compression rate performance and is often referred to as optimal coding.It is found that with the development of data lossless compression applications and design method of electronic circuit,the hardware-based lossless data compression technology has also received increasing attention.Compared with the traditional way of data compression based on software platform,a well-designed dedicated data compression module has obvious advantages in resource utilization,t computing efficiency and power consumption.In the research of current hardware-based data compression systems,efforts are made to ensure system security or meet high data throughput requirements,usually at the expense of performance in compression ratios.Therefore,the research work focuses on the shortcomings of the existing hardware algorithms of Huffman coding,enhances the coding efficiency by exploiting the advantages of parallel computing and pipeline processing of hardware modules.The research work in this thesis has achieved the following results:(1)A file partitioning method for Huffman coding is proposed to alleviate the contradiction between the throughput requirement and the memory resource limitation.Researching data partitions for various typical sizes,after statistical calculation and comparison of the time required for each sub-task in the encoding process,it is found that:the necessary condition for achieving the advantages of the pipeline operation is that when the data is chunked and compressed into blocks,the data partition size should be larger than 25.2 KB.Fixed data partition sizes of 32 KB,64 KB,etc.are adopted for hardware-based encoding system.(2)A method for sorting nodes by frequency is proposed.The sorting algorithm combines the advantages of fast sorting and heap sorting to give full play to the parallelism of hardware.he character nodes are assigned to 3(or 2)intervals according to the statistical frequency distribution,and then the nodes are sorted in each interval in parallel.Experiments show that the proposed hardware structure can increase the sorting rate by more than 2 times.(3)A hardware-based algorithm for constructing dynamic Huffman tree and corresponding storage structure are proposed.Based on block memories and distributed registers,a dynamic Huffman tree is constructed by using a new mapping relationship of parent node to child node,which allows allows code allocation for leaf nodes in hierarchical order.Therefore,code can be efficiently allocated to leaf nodes of Huffman tree without increasing hardware resource consumption.Experiments show that the rate of establishing the Huffman tree and generating the node codes can be increased by more than 4 times.(4)A method for processing overflow nodes is proposed.This adjustment method can adjust the overflow leaf nodes while ensuring the integrity of compressed data.By taking advantage of the parallel computing of hardware systems,this algorithm improves the processing rate by more than 2 times compared with the traditional method of adjusting the overflow nodes one by one.An efficient splicing variable-length coding algorithm and hardware architectures are proposed.Test results show that its average throughput is close to 40 MBps.(5)When compressing data,dictionary compression algorithm is usually used to achieve better compression ratio.A high speed and high hit-rate match-search algorithm based on hardware platform and the corresponding hardware structure is proposed.The test results show that the average throughput rate reaches 93.22 MBps,which is 1.49 times higher than the traditional data matching method,which ensures the data throughput performance of the whole system.
Keywords/Search Tags:Hardware-Based Lossless Compression, Data partition strategy, Structure for Huffman Tree, Variable-Length Codes Splicing, Match Search
PDF Full Text Request
Related items