| In the past two decades,the rate of data growth has been unprecedented.Therefore,systems for storing,processing,and analyzing data have become common and critical systems.The key to the performance of data systems is the query optimizer,which transforms high-level declarative queries(such as SQL)into efficient execution plans.However,query optimization is very complex and poses two key challenges.Firstly,optimizers use a large number of manually-designed heuristic methods to reduce complexity,but the performance of these methods is not optimal.Secondly,the development cost of query optimizers is very high and human experts may take several months to write the first version,which can take years to perfect.To compensate for the shortcomings of traditional optimizers,machine learning has emerged as a promising direction for improving traditional query optimization due to its powerful adaptability and accuracy.Machine learning can learn from past query execution logs and can be used to predict the optimal execution plan for new queries.Recently,the research on Machine Learning for Database(ML4DB),which empowers databases,has garnered increasing attention and demonstrated the superiority of enhancing traditional database performance in a data-driven manner.Among them,Reinforcement Learning has been applied to construct an autonomous query optimizer that generates query plans and exhibits its advantages in finding competitive query plans without the need for traditional query optimizers.However,these ”alternative optimizer”approaches have not yet been practically implemented.Commercial database vendors still hesitate to incorporate them into their database management systems.This reluctance stems from the overestimation of the capabilities of existing methods in machine learning models.Machine learning models are data-driven,enabling them to learn new data distributions and deploy machine-specific features.Nevertheless,they also pose challenges such as cold start and the inability to learn internal query optimization rules within databases.In order to compensate for the shortcomings of machine learning and leverage the years of development in mature databases,this thesis proposes LEON(ML-aid Ed query Optimizatio N).The self-tuning capability of the expert query optimizer is improved by leveraging the fundamental knowledge in machine learning and the expert query optimizer to adapt to a specific deployment environment.To train the machine learning model,a pairwise ranking objective is proposed in this thesis,which is quite different from the previous regression objective.To help the optimizer get rid of local minima and avoid failures,this thesis proposes an exploration strategy based on ranking and uncertainty,which can discover valuable plans to help the optimizer.In addition,this thesis proposes a machine learning model-guided pruning to improve planning efficiency without compromising excessive performance.Finally,this thesis demonstrates on a wide range of publicly available datasets that the proposed framework can outperform state-of-the-art methods in terms of end-to-end latency performance,training efficiency,and stability. |