Building compressed database systems

Posted on:2003-03-22

Degree:Ph.D

Type:Thesis

University:Cornell University

Candidate:Chen, Zhiyuan

Full Text:PDF

GTID:2468390011989735

Subject:Computer Science

Abstract/Summary:

Over the last decades, improvements in CPU speed have outpaced improvements in disk access rates by orders of magnitude, motivating the use of data compression techniques in database systems to trade reduced disk I/O against additional CPU overhead for compression and decompression of data. In this thesis, we study how to build compressed database systems that achieve better performance than conventional database systems. We address the following four issues.; First, we study how to compress string data stored in database systems. Despite the abundance of string-valued attributes in relational schemas, there is little work on compression for string attributes in a database context. We propose a Hierarchical Dictionary Encoding strategy that intelligently selects the most effective compression method for string-valued attributes, and achieves significant performance improvement over standard compression techniques such as Lempel-Ziv.; Second, we examine how to run queries efficiently on compressed data. None of the previous work suitably addresses the role of the query optimizer: during query execution, data are either eagerly decompressed when they are read into main memory, or data lazily stay compressed in main memory and are decompressed only on demand. We introduce the compression-aware query optimization problem and propose two algorithms: a provably optimal algorithm and a fast algorithm that almost always generates optimal execution strategies. Experiments using TPC-H data demonstrate that our approach results in up to an order speed-up over existing approaches.; Third, we study the problem of automatically selecting compression methods based on a query workload, which is crucial for the performance and usability of compressed database systems. We examine the complexity of the problem and propose a solution that conducts a cost-based search on possible compression configurations. Our experiments using both real and TPC-H data demonstrate the effectiveness of our approach.; Finally, we examine how to compress results shipped to clients that have limited bandwidth connections or severe storage restrictions. We choose a combination of compression methods that use statistical and semantic information of the query results to enhance the effect of compression.

Keywords/Search Tags:

Database systems, Compression, Query

Related items

1	Compressed Xml Database Query Processing Method
2	Research On Compression, Operation And Query Processing Methods Of Massive Datasets
3	Efficient Methods For Reverse K-Nearest Neighbor Query On The Multidimensional Objects
4	Query optimization in distributed database systems and multidatabase systems
5	Compression Algorithm Based On Support Columns Stored Data
6	Research On The Query Information In The Chinese Query Sentences Of Database
7	Statistics on query expressions in relational database management systems
8	Based On Historical Query Relational Database Query Optimization Keyword Research Questions
9	The Research And Application Of Database Value-based Query Optimization
10	Based On The Keyword Query Database Search Engine System Design And Implementation