Font Size: a A A

Compression Algorithm Based On Support Columns Stored Data

Posted on:2011-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2208360302498811Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Because of the growing business operation nowadays, the data storage of many decision support systems and OLAP systems becomes more and more unacceptable as it is gradually close to TB level. In the face of the tables which hold the huge volume of data, how to use the storage more efficiently and cut down the cost of maintenance for a better performance when query on the huge table turns into the focus in the domain of database research. Certainly, one of the efficient methods to resolve the above problems is to introduce the data compression technology into the database system.In the normal relational database system, data is on the basis of the row to be stored (row-store), which means that the values of different attribute from the same tuple are stored consecutively. However, because of the lower correlation between the different attribute values that come from the different ranges, the data compression technology is not easy to implement in the row-store database system.The disadvantages have been eliminated in the column-store database. A column-store database system is one in which each attribute is stored in a separate column, such that successive values of that attribute are stored consecutively on disk. The characteristic of the similarity of adjacent values in column-store database system is useful for all kinds of classical data compression algorithms. Consequently, we discuss how these classical algorithms can be integrated into a column-store database in this paper.Firstly, the importance of data compression under the column-store condition, with the development history of data compression and the present situation of the famous commercial column-store databases abroad, have been elaborated. Specifically, after the summarization about the relevant concepts of data compression and the advantages of column-store database, we discuss various data compression algorithms, including Huffman Encoding, Arithmetic Encoding, LZ77 algorithm, LZW algorithm, Run-Length Encoding (RLE), Null Suppression and so on.Secondly, we make a study of the structural design for Column-Store Compression Library, which is composed of the storage mechanism, the compression module, and the data-source module. The storage mechanism provides guarantee for the realization of those compression algorithms because that it describes a reasonable solution to the storage of different compressed data in column-store database. The compression module encapsulates compression details by providing standard interface for external modules. And the data-source module plays the role of communication between the compression module and database storage layer. Besides, by evaluating the attributes of different compressed data and improving the traditional database executor operator, we also describe how the query execution engine can retrieve data directly from compressed data without decompression (compression state query).Finally, with Shenzhou OSCAR database as the platform, we implement the above key technologies. On the basis of the analysis and comparison on relevant performance testing results, both validity and effectiveness of our research have been testified, which can not only reduce the storage size of column-store database, but also optimize the database system performance automatically.
Keywords/Search Tags:data compression, column-store database, Column-Store Compression Library, compression state query
PDF Full Text Request
Related items