Font Size: a A A

Research On Non - Decompression Algebra Operation Algorithm On Compressed Data

Posted on:2017-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:X Z DingFull Text:PDF
GTID:2278330485995698Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays are generating huge amounts of data every day in the era of big data. For massive data processing which generally need to filter the data to analysis, and do some actions that can be executed to obtain concrete results, and algebraic operations which are the most basic actions. In a series of basic and efficient algebraic operation on the big data, there are two solutions, a parallel approach is adopted, another is to use compression techniques. This paper focuses on the discussion of data compression technology in big data research, andpresents a new compression algorithm on the compression algorithms without extracting algorithm of algebraic operationsandthe concept of compression computation. Meanwhile this paper implements a database prototype based on the compression algorithm with query optimization strategy.Major achievements and contributions are as follows:(1) Since in the massive data management, the compressed data can be done some operations without decompressing first, under the condition of normal distribution, according to features of column data storage, a new method which oriented column storage of CCA(Column Compression Algorithm) was proposed. First the method through on the length of data to classified and then it used sampling to get more repetitive prefix to encode by dictionary coding way, meanwhile the data compression structure of CI(Column Index) with CR(Column Reality) was proposed to reduce storage requirement of massive data. Theoretical analysis and experiment verified the effect of CCA.(2) The compression algorithm research and realization based on CCA algebra of setsand and relational algebra operations without a decompression algorithm, including union, intersection, difference, cartesian products, selection, projection, connections,and so on. Theoretical analysis and experimentverified the effectiveness of the algorithm.(3) According to the characteristics of huge amounts of data and CCA based compression algorithm, we studied the column store database orient database query optimization strategy, and CCA related optimized strategy was given.(4) According to CCA research on compression algorithm of arithmetic and algebraic operations, based on the research results we propose a prototype database system D-DBMS(Ding-Database Management System). The theoretical analysis and the results of experiments on 1TB data show that the compression algorithm can significantly improve the performance of mass data storage efficiency and data manipulation. Compared to BAP(Bit Address Physical) and TIDC(Tuple ID Center) method, on compression rate CCA was improved 51% and 14%, and on running speed was improved 47% and 42%.
Keywords/Search Tags:Massive data compression, Compression Computation, Column Compression Algorithm(CCA), Column Index(CI), Column Reality(CR), Algebraic operation
PDF Full Text Request
Related items