Font Size: a A A

Analytic query processing at bare metal speeds

Posted on:2016-03-08Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Li, YinanFull Text:PDF
GTID:2478390017981347Subject:Computer Science
Abstract/Summary:
Modern large-scale data processing platforms require techniques to process data at high speeds, as there is an arms race towards building and deploying near real-time analytical systems. The primary goal of this dissertation is to design fast analytical data processing techniques that aim to process data at or near the speed of the underlying hardware.;In this dissertation, we describe several techniques that can improve the speed and scalability of interactive analytic data processing systems. First, we present a fast scan method, which is a core operation in interactive analytic data engines. The proposed scan method, called BitWeaving, exploits the parallelism available at the bit level in modern processors, and operates on multiple compressed data items in each processor instruction. Second, we describe an encoding scheme that turns skew in both the data and predicate distributions into a benefit for scan performance. This scheme creates encodings that map frequent data items to (compressed) codes that can easily be distinguished from other codes by only examining a few bits in the full code. Consequently, the encoding scheme reduces the memory bandwidth and CPU costs that are consumed when evaluating scans. Third, given these fast scan algorithms and other technical trends, we believe that an old idea, namely denormalization, now becomes more practical in modern analytical systems. The last part of the thesis presents a technique called WideTable, which uses aggressive denormalization to flatten a database schema into one or more "wide tables", and converts complex join queries to simple and efficient (BitWeaved) scans on the denormalized table. To avoid the pitfalls associated with denormalization, e.g. large space overhead, WideTable uses a combination of techniques including dictionary encoding and columnar storage.;Finally, we benchmark the performance of the proposed techniques in the Quickstep database system. We experimentally evaluate our methods, and compare them to the leading open-source main memory DBMS in a main memory setting using the TPC-H and the star schema benchmarks. We find that our methods outperform the traditional method for nearly all queries, with over an order of magnitude in performance improvement on a majority of queries. In addition, our methods also show better scalability on modern multi-core CPUs.
Keywords/Search Tags:Processing, Data, Modern, Techniques, Analytic
Related items