As one of the core components of the relational database management system,the query optimizer can provide reliable guarantee for the query performance of the database.With the advent of the big data era,database query performance is facing higher requirements,and database query optimizers are constantly facing new challenges.The cost estimation module in the query optimizer can be used to guide the optimizer in selecting efficient query execution plan,thereby improving the query performance of the database.Therefore,establishing a reasonable cost estimation model is of great practical significance.With the gradual maturity and popularization of deep learning technology,database cost estimation methods based on deep learning have become a research hotspot.The solutions for similar tasks are mainly divided into two categories.One is based on query statements,and the other is based on query execution plans.The basic idea of the query execution plan based modeling method is to use neural networks to establish a corresponding tree deep learning model for cost estimation based on the structure of the query execution plan.Moreover,due to the design concept of this method being more in line with the calculation process of traditional cost estimation methods,it has attracted more attention from researchers.However,the current methods based on query execution plan modeling have the following shortcomings.Firstly,the existing methods do not fully consider the types of influencing factors for query optimizer cost estimation and lack the utilization of database statistical information.Secondly,the existing methods cannot support parallel computation while capturing temporal features and long-term dependencies.Thirdly,the existing methods do not fully consider the scenario of dynamically adding query sets containing new tables while the database is online,resulting in an inability to estimate the cost of query execution plans efficiently and accurately.In response to the above issues,this thesis focuses on the problem of database cost estimation based on query execution plans,aiming to establish an accurate cost estimation model in static scenarios and scenarios where query sets containing new tables are dynamically added.The research faces the following challenges.Firstly,how to analyze the structure and content of query execution plans,as well as the impact of database statistical information on cost estimation,and use reasonable methods for feature extraction and coding of these influencing factors?Secondly,how to design a neural network model that can support parallel computing and conform to the structure and content characteristics of query execution plans,while capturing temporal features and long-term dependencies in query execution plans?Thirdly,how to use reasonable methods to steadily iterate and update the parameter weights of the cost estimation model in the scenario of dynamically adding query sets containing new tables,and achieve more accurate cost estimation?To address the above three challenges,in this thesis,the cost estimation model TQRNNCE(Tree-based Quasi-Recurrent Neural Networks for Cost Estimation)based on a tree-based quasi-recurrent neural network with query execution plan and the cost estimation model TQRNN-EWC(Tree-based Quasi-Recurrent Neural Networks with Elastic Weight Consolidation)that supports scenarios in which query sets containing new tables are dynamically added in the online state are proposed.Relying on the query execution plan output by the real PostgreSQL database environment,the cost estimation effect of the proposed model is verified.The main work and contributions of this thesis are as follows:1.The cost estimation model TQRNN-CE of a tree-based quasi-recurrent neural network based on query execution plan is proposed in this thesis.which supports static underlying data scenarios and conforms to query execution plan structure and content information.This method models query execution plan data information with tree structure features and temporal features.It can capture the relationship between tree shaped temporal data in the query execution plan and the cost of the query execution plan and use reasonable methods to fully utilize database statistical information that reflects the distribution of underlying data in the database,thereby achieving cost estimation of the query execution plan.At the same time,the model supports parallel computing and can capture temporal features and long-term dependencies in query execution plans.2.The cost estimation model TQRNN-EWC,which integrates with incremental learning and supports dynamically adding query sets containing new tables,is proposed in this thesis.This model is based on the TQRNN-CE model mentioned above.It selects the Elastic Weight Consolidation Algorithm(EWC)as the main method of incremental learning,maintains the model parameter importance matrix for the TQRNN-CE model,and iteratively updates the model parameter weights online in the scenario of dynamically adding query sets containing new tables.This enables the model to adapt to the new query sets containing new tables while adhering to the old query sets,and to estimate costs more quickly and accurately.3.This thesis conducts extensive experiments based on the PostgreSQL database,as well as the real dataset IMDB and query set JOB,to verify the effectiveness of the model.Through cost estimation performance comparison experiments,it is shown that the TQRNN-CE model performs better overall than the baseline models in four evaluation indicators:Q-Error.MAE,RMSE and MAPE.At the same time,it is verified that the TQRNN-EWC model is more effective and stable.Through time performance comparison experiments,the efficiency of the TQRNN-CE model and the ability of the TQRNN-EWC model to complete incremental learning in a relatively short time are verified.The effectiveness of each module in the two models is verified through ablation experiments. |