Font Size: a A A

Design And Implementation Of Query Optimizer For Massive Distributed Columnar Database

Posted on:2021-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q P AoFull Text:PDF
GTID:2428330623467759Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,how to realize the rapid storage and analysis of massive data has become a hotspot in the field of database research,and distributed databases with good performance have emerged as the times require.Traditional distributed databases often use row storage,which reduces computational efficiency and storage efficiency.A distributed columnar database stores each column of each table separately after slicing.This method can make full use of the similarity of the sharded data for efficient compression and extraction,and can also effectively use the parallel processing power of the processor during calculation.Speed up calculations to increase utilization of hardware resources.This thesis designs and implements an online analytical processing(OLAP)-oriented query optimizer for the distributed column database scenario.The purpose is to solve query optimization problems in complex query scenarios,accelerate task execution,and efficiently use hardware resources.The main work here is as follows:1.Research and compare the query optimizer of row database and column database,design the query optimization process in the distributed column scenario,and complete the design and implementation of the process from SQL statement analysis to distributed execution plan division and scheduling;2.Design and implement a series of corresponding operators according to the characteristics of columnar calculations,especially in the distributed scheduling and execution of complex statements such as grouping and aggregation,and design a targeted solution based on the columnar scenario to improve the operator.The degree of parallelism between them,reducing data transmission overhead,and achieving quick response to query results;3.Consider the performance improvement of the compiler execution technology during the execution of the operator and the data distribution of the columnar scenario,design the corresponding scheduling algorithm,provide operator fusion suggestions for the execution engine,and improve the computing efficiency of the operator.Finally,we test the function and performance of the query optimizer to verify its usability and superiority.Databases using the query optimizer designed here are only a fraction or even a few dozens of SparkSQL queries in a single table scan,multi-table join,grouping aggregation,and sorting.
Keywords/Search Tags:Columnar database, distributed computing, query optimization, scheduling optimization, online analytical processing
PDF Full Text Request
Related items