Font Size: a A A

Research And Implementation Of Execution Plan Cache Optimization Based On Machine Learning

Posted on:2022-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2518306764976999Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,artificial intelligence technology,as an important tool for data processing and analysis,has been applied in various fields of life.As the data carrier of artificial intelligence,the database needs to provide it with faster and more convenient query services.However,with the increasing amount of data,traditional database optimization methods face new bottlenecks.Therefore,finding a more effective database optimization method has become an urgent problem to be solved.Today,artificial intelligence has been extensively studied in database optimization tasks,such as cardinality estimation,join order selection,and execution plan caching.The execution plan cache bypasses the optimizer,directly saves the execution plan of the historical query in the cache,and then allocates the execution plan for the query from the cache.While this approach can save query time,there is currently no efficient way to allocate execution plans from the cache.At present,there have been many studies trying to improve the accuracy of execution plan caching,but most of them have the following three problems.First,they lack an efficient way to extract feature vectors.The feature extraction of binding variables is the key to the classification of execution plans.If the features of binding variables with categorical properties cannot be extracted,it will inevitably affect the accuracy of the model.Second,they are difficult to maintain the execution plan cache dynamically.When the data in the database changes,the saved execution plan may not be suitable for the new parameter space.Therefore,the execution plan cache model needs to be dynamically adjusted to ensure the model prediction accuracy.Finally,the training efficiency of existing models in practical application scenarios is inefficient.Since database systems are all queried online,there is no sufficient data and training time like offline tasks.Therefore,it is necessary to ensure that the model is trained in a short time,and a small number of training samples can be used to achieve a high accuracy rate.In view of the problems in the above methods,the main work of thesis is to propose a method COPC based on machine learning execution plan caching.Specifically,COPC proposes a new data encoding method for the parameter query optimization task.Compared with the defects of traditional data encoding methods,COPC can not only obtain the magnitude information of the parameters in the table,but also obtain the spatial information of the parameters.It solves the problem of insufficient parameter feature extraction ability,in which the magnitude and spatial information can accurately capture the classification features of different parameters.Then,in order to dynamically maintain the execution plan cache,thesis proposes an adaptive random forest algorithm based on this coding method and applies it to the classification task of the execution plan cache,which makes COPC only use a small amount of data and training time.High accuracy,and the model supports incremental training.Then the model proposed in thesis and the baseline model are tested in public datasets,and the experimental results demonstrate the effectiveness of the model COPC in terms of efficiency and accuracy.In addition,thesis also conducts comparative experiments on common machine learning classification algorithms,and the results show the advanced nature of adaptive random forest algorithm in executing plan caching tasks.
Keywords/Search Tags:Query Plan Cache, Query Optimize, Machine Learning
PDF Full Text Request
Related items