Design And Implementation Of Query Optimizer For Massive Distributed Columnar Database

Posted on:2021-02-16

Degree:Master

Type:Thesis

Country:China

Candidate:Q P Ao

Full Text:PDF

GTID:2428330623467759

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the era of big data,how to realize the rapid storage and analysis of massive data has become a hotspot in the field of database research,and distributed databases with good performance have emerged as the times require.Traditional distributed databases often use row storage,which reduces computational efficiency and storage efficiency.A distributed columnar database stores each column of each table separately after slicing.This method can make full use of the similarity of the sharded data for efficient compression and extraction,and can also effectively use the parallel processing power of the processor during calculation.Speed up calculations to increase utilization of hardware resources.This thesis designs and implements an online analytical processing(OLAP)-oriented query optimizer for the distributed column database scenario.The purpose is to solve query optimization problems in complex query scenarios,accelerate task execution,and efficiently use hardware resources.The main work here is as follows:1.Research and compare the query optimizer of row database and column database,design the query optimization process in the distributed column scenario,and complete the design and implementation of the process from SQL statement analysis to distributed execution plan division and scheduling;2.Design and implement a series of corresponding operators according to the characteristics of columnar calculations,especially in the distributed scheduling and execution of complex statements such as grouping and aggregation,and design a targeted solution based on the columnar scenario to improve the operator.The degree of parallelism between them,reducing data transmission overhead,and achieving quick response to query results;3.Consider the performance improvement of the compiler execution technology during the execution of the operator and the data distribution of the columnar scenario,design the corresponding scheduling algorithm,provide operator fusion suggestions for the execution engine,and improve the computing efficiency of the operator.Finally,we test the function and performance of the query optimizer to verify its usability and superiority.Databases using the query optimizer designed here are only a fraction or even a few dozens of SparkSQL queries in a single table scan,multi-table join,grouping aggregation,and sorting.

Keywords/Search Tags:

Columnar database, distributed computing, query optimization, scheduling optimization, online analytical processing

PDF Full Text Request

Related items

1	Massive Distributed In-memory Columnar Database Query Engine For On-line Analytical Processing
2	Research On Key Technologies Of Distributed Rank-aware Query Processing
3	Research On Data Query Processing And Optimization In Distributed Database
4	Cost Model And Query Optimization For In-memory Column-stored Database
5	Distributed Joins And Optimization For BIG Table Based On Database OceanBase
6	Design And Implementation Of Optimization Method For Distributed Columnar In-Memory Database Storage Engine
7	Optimizing Query Processing In Distributed In-Memory Databases
8	Study Of Query Optimization In Distributed Database
9	Research On Query Optimization Technology In Distributed Real Time Database
10	Parallel Processing And Optimization Of Database Query Under Distributed Architecture