Font Size: a A A

Design And Implementation Of Improved Huffman-based Context Data Compression Algorithm

Posted on:2018-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:K Y QuFull Text:PDF
GTID:2348330518994574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, there are huge amounts of data needs to be stored and processed in many fields, such as biological engineering, digital broadcasting, high definition television,aerospace, national defense and so on, which brings a great challenge to the computer disk storage and network transmission performance. The compression of massive data has got more and more attention because it can further save storage space and improve the speed of network transmission.At first, this paper discusses the classical data compression algorithms and their principles. Due to its high compression efficiency,high encoding speed, easy to implement, Huffman algorithm has been widely used in the field of data compression. Although it shows good performance of compression, the Huffman algorithm don't take into account the relationship of context to be compressed. As a result, the data redundancy is still very large and taking up much disk space,compression effect remains to be further improved. This paper presents a Huffman compression algorithm based on Markov chain, which reduces the average coding length of traditional Huffman algorithm,thus, we can get better compression effect. Firstly, the Markov model was utilized to explore the correlation between the data to be compressed. In this algorithm, every character of the data to be compressed is regarded as a state, and then occurrence times of each state and the transfer times between all states are figured out. According the statistical results, we can get the transition probability matrix and judge the correlation between the contexts of compression data. Finally,the algorithm compresses the context that has the correlation to realize better compression effect.When the amount of data is large , the standalone is unable to make full use of the compression algorithm because its processing capability is insufficient. Therefore, this paper transplants the data compression algorithm to the HADOOP, which is a most popular distributed computing system. In this way, we can realize parallel compression for large amounts of data and greatly improve the compression efficiency.The results of experiment show that it can not only shorten the time of the data compression, but also reduce the data storage space and further accelerate transmission speed in the network.
Keywords/Search Tags:Markov Chain Model, Huffman Algorithm, Data Compression, Context
PDF Full Text Request
Related items