Font Size: a A A

A Query Optimizer For The Column-Oriented Distributed In-Memory Database System

Posted on:2017-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:X K ZengFull Text:PDF
GTID:2308330485985008Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the increasing demand of querying and analyzing for data, the traditional relational database could not satisfy the requirement anymore. Since the IO performance of memory is far beyond the disk, by using the memory as the storage medium we can significantly decrease the latency of access to database. And the column-oriented database architectures has been wildly used in In-Memory Database system because of its higher compressibility and smaller size of intermediate data during querying.In this paper, we discuss the difference between the In-Memory storage and the disk storage, the column-oriented and the row-oriented architectures. And then, we designed a query optimizer for the column-oriented In-Memory Database system. This query optimizer includes the follows:1. Applying rule-based optimization which has used in traditional relational databases in column-oriented database architectures. Implying a series of methods to optimize and transform on the logical query tree such as push down predicate condition, simplifying condition, and so on, so that it has fewer operators units, and has a smaller amount of data in a distributed environment.2. For Queries which join multi-table, combined with practical application environment, using non-random stratified dynamic programming algorithm to calculate the optimal join order, and provide a good interface in the implementation for change the algorithm of join path, so that it can use different algorithms to provide more suitable join path in different application environments.3. In the distributed environment, calculating the expect cost of many query plan with the information of data sheet storage location, network overhead, node load, and so on. Using the greedy algorithm and Genetic Algorithm to choose the best query plan paralleled execution on a distributed cluster, improving the ability to respond to a real-time queries.By implementing the query optimization module on the column-oriented distributed In-Memory database systems-GoldFish, and contrasting with the Spark-SQL which is a popular open source DBMS in the same case. It is found that the Gold Fish with the query optimizer outperform in query latency and memory usage.
Keywords/Search Tags:In-Memory Database, Query Optimizer, Join Path, Genetic Algorithm, Greedy Algorithm
PDF Full Text Request
Related items