Font Size: a A A

Data Compression Algorithms For Cloud Storage

Posted on:2018-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:W W BaiFull Text:PDF
GTID:2348330518988027Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and information technology,industry applications need to deal with the rapid growth of a large amount of data every day,in order to deal with these data effectively,people put forward cloud computing technology and cloud storage technology.In the data storage,the large amount of data which people need to store resulted more consumption of storage resources,and the data occupies a lot of bandwidth resources on the network transmission.Therefore,as an effective way to solve the problem of resource consumption--data compression processing has become a focus of people's current research.The cloud-oriented compression algorithm can enhance the processing power of data to meet the requirements of large-scale data processing.Aiming at the data compression algorithm for cloud storage,the main work of this paper includes the following two aspects:The first part is to improve the data processing efficiency and compression ratio of the compression algorithm.Firstly,get its character code by processing the data after BWT,and the data is written into the wavelet tree node according to the character encoding.The data is stored in the bit vector form.On the basis of this,the statistical method is used to study the distribution of Runs in different data set vector,and the mixed coding compression structure is designed to compress the bit vector.Then,according to the statistical data of the Runs in the wavelet tree vector,some accelerometers are designed,such as the accelerometer for calculating the integer code length,Runs obtains the accelerometer,gamma and delta decode accelerator.All of them can improve the execution speed of the compression algorithm.Secondly,according to the characteristics of the wavelet tree and the reversible nature of the BWT transform,the data recovery algorithm is implemented.Finally,a multi-threaded data compression algorithm on a single processor is implemented.The second part mainly focuses on the implementation of cloud storage for the first part,and the compression of data in cloud storage mainly deals with the efficient processing of large-scale data by means of powerful cluster distributed parallel processing.In this paper,we use the Hadoop platform and Map Reduce parallel model to achieve cluster-baseddistributed data compression processing.First,we design and encapsulate the stand-alone compression algorithm into the interface that Map Reduce can call directly to process the block data.Then,we set up the data block strategy and use the Map Reduce programming model to implement the parallel compression and decompression of the data block on the cluster.Experimental work,on the one hand,we adjusted the parameters of the multi-thread compression algorithm based on external memory,and compared the algorithm with the classical compression algorithm in compression rate,compression and decompression time.The results show that the compression algorithm has advantages in compression time and compression ratio,but the decompression time is slightly longer than bzip2.On the other hand,the compression rate of the implementation for distributed parallel compression algorithm is slightly worse than the data compression algorithm on the single and it is more time-consuming in the network IO.And compared with the single processor compression algorithm,the distributed compression algorithm has the advantage in dealing with the data set scale and data compression processing.
Keywords/Search Tags:Data compression, BWT transform, wavelet tree, mixed coding, distributed parallelism
PDF Full Text Request
Related items