| As the core of the database management system,the query optimizer aims to select an optimal plan for queries.Traditional optimizers often construct a heuristic cost model to select the optimal plan in the plan search space through metadata information and expert experience.However,these optimizers cannot find the best query plan in a reasonable time because of the error in the heuristic cost model.Recently,researchers in the database field have proposed learning-based query optimization methods that use machine learning techniques to learn high-quality plans from past experience.However,these methods only embed simple metadata as features in the learning model,ignoring the importance of high-level metadata,i.e.,the impact of data constraints.For this purpose,this thesis proposes a constraint-enhanced learning query optimization system that adopts data constraints to infer equivalent queries and then designs a learning model to predict the optimal query and its query plan.The research in this thesis focuses on the following three points.(1)To address the problem of not being able to select the optimal query plan for a query due to insufficient use of data constraints in learning-based query optimization methods,considering that equivalent queries with different query costs can be obtained through data constraints,this thesis proposes a novel learning-based query optimization method,Celo.This thesis first designs a predictive model based on tree convolution and then proposes a method to find an equivalent query plan for a given query with the help of data constraints.This plan is used to augment the set of alternative plans for the given query.The optimal plan is predicted using the predictive model.Finally,experience and equivalence knowledge generated from the optimal plan is used to train the predictive model and build a training loop.The experimental results show that Celo can improve the average query optimization performance by 12.5% compared to the cutting-edge learning query optimizer Bao,and by 18.5% compared to Postgre SQL.(2)When there are a large number of equivalent queries in the query,if the query is randomly selected for augmentation,it will not produce the best query optimization effect,and to solve the problem of how to select an optimal equivalent query for plan augmentation.,this thesis proposes a learning-based equivalent optimal query selection method,EOQS.Firstly,this thesis proposes a new feature extraction method that extracts features not only from the query but also from the estimated plan provided by the expert optimizer for the current query.Then,a learning-based selection model is designed to efficiently learn how to select the optimal query based on the obtained features.Finally,it is experimentally verified that EOQS can select the optimal query among the equivalent queries with higher accuracy than three baseline methods.(3)This thesis synthesizes the above studies to propose a unified query optimization system.This system first obtains multiple equivalent queries for a given query in Celo through data constraints,then pre-selects the optimal query among the equivalent queries by EOQS,and uses the plan of this optimal query to augment the set of alternative plans generated by the given query in Celo.It is experimentally demonstrated that this system has a faster querying performance compared to the Celo which randomly selects an equivalent query for augmentation. |