| Database as an important support for digital infrastructure construction,query processing and optimization has been an important research direction in the field of database.In online analytical processing(OLAP)scenarios,there are a large number of multi-relational complex analytical queries,and traditional optimizers perform join optimization with high search cost and difficulty in guaranteeing the quality of query plans in the case of nonlinear data distribution.In addition,the cardinality estimation of industrial-grade databases relies on statistical information and independence-specific constant assumptions for estimation,making it difficult to capture current data state changes.In this paper,we propose a dynamic double DQN-based join order optimization method and a GBDT-based query cardinality estimation optimization method to optimize the performance of database queries from two perspectives,taking Postgre SQL,an open-source relational database,as the research object and focusing on the existing problems in database query optimization.The main research contents of this paper are as follows.(1)In query join order optimization,DQN suffers from the problem of over-estimating action values,which can lead to limited query performance.In addition,ε-greedy exploration is not efficient enough and does not enable deep exploration.Therefore,a deep reinforcement learning-based connection order optimization method is proposed to improve the prediction accuracy of the training network by first modeling the connection query as a Markovian decision process and training the neural network model using a weighted double Deep Q-network.Actions are selected by a dynamic progressive search strategy to improve the randomness and depth of exploration to accumulate higher information gain exploration.The introduced tree state representation encoding approach preserves the relational hierarchy information to further improve the effectiveness of the method.After cost estimation for each query plan,a join plan that fits the data distribution and has a balanced query load is selected.(2)In the study of query cardinality estimation,in order to solve the problems of high cost of offline training of deep learning models to estimate the base and the lack of generalization ability to capture database model changes,we propose to use gradient boosting decision tree to predict the base,extract the mapping relationship between the relational features of the query base table and the estimated base,and expand the use of segmented linear regression trees in the model to shorten the training time,due to the lightweight of the model.The model generalization capability is improved by periodically updating the expanded model to capture database change information.In addition,the confidence interval is set by quantile regression to ensure the robustness of cardinality estimation within the interval and avoid abnormal base prediction cases.Finally,the above two optimization methods are applied to Postgre SQL database.The test results show that the cardinality estimation accuracy is improved by about 5times and the average query performance is improved by 32.7% compared with the original Postgre SQL cardinality estimation method.This study extends the generalizability and scalability of the query optimizer and improves the query execution efficiency of the database system based on the improvement of the cardinality estimation accuracy. |