Font Size: a A A

Research Of Compression Algorithm For Sparse Data In Column-oriented Database

Posted on:2011-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:X L QiaoFull Text:PDF
GTID:2178360305468169Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the wide use of OLAP technology, such as Data Warehouse and Decision Support, there has become an increasing demanding for efficient way to improve the performance of database query engine. People have presented a new concept to design database system, column-oriented database which stores its content by column rather than by row.In this paper, we first compare column-oriented database and row-oriented database in the storage layer and query engine, and observe that the query engine of column-oriented database is much more efficient because of its storage structure. We also analyze the compression technologies such as Null Suppression, Dictionary Encoding, Run-length Encoding and effects of operating directly on the compressed data. We also do some work to discuss the effect of Late Materialization on query engine.Through analyzing the characteristics of sparse data, we conclude that it's suitable for column-oriented database to store sparse data and give the method of how to design sparse database. Then we investigate sparse data scenarios such except OLAP, analyze the storage structure characteristics of sparse data and give three common data models in the sparse database.Finally, we do our research on one algorithm of Dictionary Encoding, Lempel-Ziv and compare its two branching algorithms, LZ77 and LZ78, and we give an improved algorithm based on LZ77 and LZ78 to use their advantages. Then through experiments of comparing the improved algorithm with LZ77 and LZ78 in compression ratio and compression time, we conclude that the improved algorithm performs much better overall.
Keywords/Search Tags:Column-Oriented Database, Sparse Data, Compression Technology, Lempel-Ziv, Query Efficiency
PDF Full Text Request
Related items