Font Size: a A A

Fuzzing Methods For Query Processing Functionality Of Analytical Databases

Posted on:2022-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z K XiangFull Text:PDF
GTID:2518306776992679Subject:Enterprise Economy
Abstract/Summary:PDF Full Text Request
Analytical databases are mainly used to efficiently handle complex queries on largescale data sets,covering query optimization and query execution.As the most important and core functionality of modern analytical databases,query processing involves a wide range of codes and complex code logics,resulting in a costly testing effort.The lack of sufficient and flexible testing may lead to serious problems in the production environment.Existing work of verifying the correctness of query execution suffers from the problem of low-quality queries and difficulty in generating test oracles(ideal result sets or validation sets).Due to the lack of data skewness and query diversity,the commonly used standard benchmarks such as SSB and TPC-H are not suitable for evaluating the core aspects of query optimization,such as cardinality estimation and join order selection.For the optimizer evaluation,existing work also has the problem of not touching the core of the optimizer.At the same time,existing methods for generating the exact cardinalities of query operators are inefficient with a significant resource overhead and low feasibility.To better facilitate the evaluation of query execution and query optimization during developing databases,this dissertation designs and implements a fuzzing tool for verifying query processing functionality.Specifically,this dissertation guarantees the diversity of data and query generation based on the randomization theory,and data dependency chains between tables are built through the PK-FK association relationship;the output result representation of operators is modeled based on Constraint Satisfaction Problem and an efficient and scalable solver is also built to automatically generate the exact cardinalities of operators as well as the ideal result of the query.Furthermore,this dissertation prepares a join order evaluation module to evaluate the join ordering of the query optimizer.By applying the tool to the real-world DBMSs,this dissertation demonstrates the effectiveness and usability of this work.The main contributions of this dissertation are summarized as follows:1.Generate large-scale data with skewness and good support for joins based on the deterministic data generation mechanism,which provides a rich feature selection space for query generation,and satisfies the simulation needs of real-world test scenarios.2.Generate multi-join query templates with correct syntax and semantics based on randomization theory in a scalable way,and design a result-guided query parameter instantiation algorithm to ensure the query results are meaningful and not empty.3.Propose a exact cardinality generation method based on Constraint Satisfaction Problem and build a scalable solver to automatically and efficiently generate the exact cardinalities of operators and ideal results for queries.A join reordering algorithm based on the exact join cardinality is also designed to evaluate the join order selection in the query optimizer.4.Implement a fuzzing tool for query processing,which can be used to test the correctness of query execution,the accuracy of query optimizer cardinality estimation,and the quality of j oin order selection.This dissertation applies the tool to in-production database systems for demonstrating test scenario generation efficiency,query execution correctness testing,cardinality estimation accuracy evaluation,and join order selection assessment.And some related functional implementation problems and performance issues are found,which give a strong certificate to the usability of the tool.
Keywords/Search Tags:Data Generation, Workload Generation, Query Verification, Query Optimization, Cardinality Estimation
PDF Full Text Request
Related items